Electricity Consumption Analysis
  • Home

On this page

  • I. Data
  • II. Data Science Questions
  • III. Data Preparation
    • A. Import Libraries
    • B. Data Processing
      • i. Extracting Data and Creating Dataframe
      • ii. Data Cleaning and Feature Engineering
  • IV. Exploratory Data Analysis
    • A. Statistical Analysis
    • B. Visual Analysis
      • i. Time Series Analysis
      • ii. Appliance Analysis
  • V. Results and Inferences

Electricity Consumption Analysis

  • Show All Code
  • Hide All Code

  • View Source
Author

Ramdayal Rewaria

I. Data

The ECO (Electricity Consumption and Occupancy) dataset is a valuable resource for non-intrusive load monitoring and occupancy detection research. The dataset was collected in six Swiss households over a period of eight months, providing a comprehensive view of energy consumption patterns and occupancy information. The ECO dataset is open-source and provided under the Creative Commons License CC BY 4.0, which means that researchers can freely use the data for non-commercial purposes, as long as they give credit to the authors. [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)

The ECO dataset contains three types of data: 1 Hz aggregate consumption data, 1 Hz plug-level data, and occupancy information. The 1 Hz aggregate consumption data contains data on current, voltage, and phase shift for each of the three phases in the household. This data was collected from the smart meter in each household and provides a high-level view of the overall energy consumption patterns in the home.

The 1 Hz plug-level data was collected from selected appliances in each household, providing a more detailed view of energy consumption patterns for specific devices. This data can be used to identify opportunities for energy savings and to develop more accurate load monitoring algorithms.

The occupancy information was collected through a tablet computer and a passive infrared sensor in some of the households. The tablet computer was used for manual labeling of occupancy information, while the passive infrared sensor provided automatic detection of occupancy information. This information can be used to develop occupancy detection algorithms that can be used to optimize energy consumption in the home.

For this project, I will focus on households 4, 5, and 6 and explore the dataset using Python and various visualization libraries.

II. Data Science Questions

  1. What is the overall trend in electricity consumption for households 4, 5, and 6 over the 8-month period?

  2. Which appliances consume the most electricity in each household, and how does this vary over time?

  3. Are there any anomalies or outliers in the data that may require further investigation?

  4. Is there a difference in electricity consumption patterns of appliances for weekdays and weekends?

  5. Is there a difference in electricity consumption patterns of appliances for holidays and non-holidays?

  6. Are the electricity consumption patterns of appliances related to seasons?

III. Data Preparation

The dataset is provided in the form of folders for each household (smart meter and plug-level data). [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)

For the smart meter data, each CSV file contains aggregate consumption data of a single day and is named accordingly (e.g., “2012-08-31.csv”). Each file contains 86,400 rows (i.e., one row for each second). Rows with missing measurements are denoted as “-1”. The coverage specifies the proportion of “existing” values, i.e., values that are > -1. The rows consist of the following comma separated values:

  • powerallphases: Sum of real power over all phases
  • powerl1: Real power phase 1
  • powerl2: Real power phase 2
  • powerl3: Real power phase 3
  • currentneutral: Neutral current
  • currentl1: Current phase 1
  • currentl2: Current phase 2
  • currentl3: Current phase 3
  • voltagel1: Voltage phase 1
  • voltagel2: Voltage phase 2
  • voltagel3: Voltage phase 3
  • phaseanglevoltagel2l1: Phase shift between voltage on phase 2 and 1
  • phaseanglevoltagel3l1: Phase shift between voltage on phase 3 and 1
  • phaseanglecurrentvoltagel1: Phase shift between current/voltage on phase 1
  • phaseanglecurrentvoltagel2: Phase shift between current/voltage on phase 2
  • phaseanglecurrentvoltagel3: Phase shift between current/voltage on phase 3

Like the smart meter data, the plug data is also provided on a daily based (i.e., 86,400 measurements per day) in CSV format. In contrast to the smart meter data, it contains only one value:

  • consumption: Real power measured by the plug

Each file contains plug measurements for the whole household. The (daily) data for each plug is stored in subfolders named after the plug ID (e.g., “05”) described above.

A. Import Libraries

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import altair as alt

pio.renderers.default = "plotly_mimetype+notebook_connected"

import glob
import datetime
import warnings

warnings.filterwarnings("ignore")

B. Data Processing

The raw data needs to be processed and cleaned before it can be used for analysis.

i. Extracting Data and Creating Dataframe

The data is extracted from the CSV files and stored in a Pandas dataframe. 4 functions are used to extract the data from the CSV files and create the dataframe:

  • extract_plugs_data: Extracts the data from the CSV files of plug-level data and stores it in a Pandas dataframe.
  • extract_sm_data: Extracts the data from the CSV files of smart meter data and stores it in a Pandas dataframe.
  • create_plugs_df: Combines the data from the CSV files of plug-level data into a single dataframe.
  • create_sm_df: Combines the data from the CSV files of smart meter data into a single dataframe.
Code
# Define a function to extract the data from a CSV file and format it into a dataframe


def extract_plugs_data(csv_file_path, household):
    """
    Reads a CSV file located at the given file path and extracts plug-level data from it.
    It then adds additional columns to the DataFrame, including the date of the data,
    the household number, and the appliance type.

    Returns:
        _type_: pandas DataFrame
    """

    df = pd.read_csv(csv_file_path, header=None)
    date_str = csv_file_path.split("/")[-1].replace(".csv", "")
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d")
    subfolder = csv_file_path.split("/")[-2]
    df["Date"] = date
    df["Household"] = household
    df["Appliance"] = subfolder
    df.columns = ["Reading", "Date", "Household", "Appliance"]

    return df


def extract_sm_data(csv_file_path, household):
    """
    Reads a CSV file located at the given file path and extracts smart meter data from it.
    It then adds additional columns to the DataFrame, including the date of the data and
    the household number.

    Returns:
        _type_: pandas DataFrame
    """

    df = pd.read_csv(csv_file_path, header=None)
    date_str = csv_file_path.split("/")[-1].replace(".csv", "")
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d")
    df["Date"] = date
    df["Household"] = household
    df.columns = [
        "powerallphases",
        "powerl1",
        "powerl2",
        "powerl3",
        "currentneutral",
        "currentl1",
        "currentl2",
        "currentl3",
        "voltagel1",
        "voltagel2",
        "voltagel3",
        "phaseanglevoltagel2l1",
        "phaseanglevoltagel3l1",
        "phaseanglecurrentvoltagel1",
        "phaseanglecurrentvoltagel2",
        "phaseanglecurrentvoltagel3",
        "Date",
        "Household",
    ]
    # Drop the columns that are not needed
    df.drop(
        [
            "powerl1",
            "powerl2",
            "powerl3",
            "currentneutral",
            "currentl1",
            "currentl2",
            "currentl3",
            "voltagel1",
            "voltagel2",
            "voltagel3",
            "phaseanglevoltagel2l1",
            "phaseanglevoltagel3l1",
            "phaseanglecurrentvoltagel1",
            "phaseanglecurrentvoltagel2",
            "phaseanglecurrentvoltagel3",
        ],
        axis=1,
        inplace=True,
    )
    return df


def create_plugs_df(folder_list, household_list):
    """
    This function takes in a list of folder paths and a corresponding list of
    household IDs as input. It then loops over each folder path, extracts data from CSV
    files in the subfolders, and creates a list of dataframes. These dataframes are then
    combined into a single dataframe using the pandas concat() function and returned as output.

    Returns:
        _type_: pandas DataFrame
    """

    # Create a list to store the extracted dataframes
    df_list = []

    # Loop over each folder path and extract the data from the CSV files
    for i in range(len(folder_list)):
        folder_path = folder_list[i]
        household = household_list[i]
        subfolder_paths = glob.glob(folder_path + "/*")

        for subfolder_path in subfolder_paths:
            csv_files = glob.glob(subfolder_path + "/*.csv")

            for csv_file in csv_files:
                df = extract_plugs_data(csv_file, household)
                df_list.append(df)

    # Combine the extracted dataframes into a single dataframe
    plugs_df = pd.concat(df_list, ignore_index=True)

    return plugs_df


def create_sm_df(folder_list, household_list):
    """
    This function takes in a list of folder paths and a corresponding list of
    household IDs as input. It then loops over each folder path, extracts data from CSV
    files, and creates a list of dataframes. These dataframes are then
    combined into a single dataframe using the pandas concat() function and returned as output.

    Returns:
        _type_: pandas DataFrame
    """

    # Create a list to store the extracted dataframes
    df_list = []

    # Loop over each folder path and extract the data from the CSV files
    for i in range(len(folder_list)):
        folder_path = folder_list[i]
        household = household_list[i]
        csv_files = glob.glob(folder_path + "/*.csv")

        for csv_file in csv_files:
            df = extract_sm_data(csv_file, household)
            df_list.append(df)

    # Combine the extracted dataframes into a single dataframe
    sm_df = pd.concat(df_list, ignore_index=True)

    return sm_df


# Set the paths to the folders containing the data
folder_paths_sm = ["eco/04_sm", "eco/05_sm", "eco/06_sm"]
folder_paths_plugs = ["eco/04_plugs", "eco/05_plugs", "eco/06_plugs"]
household_nums = ["04", "05", "06"]

# Store the data in a dataframe
sm_df = create_sm_df(folder_paths_sm, household_nums)
plugs_df = create_plugs_df(folder_paths_plugs, household_nums)

# Store it in new dataframes to preserve the original data
df1 = sm_df.copy()
df2 = plugs_df.copy()

Raw Smart Meter data:

Code
df1.head()
powerallphases Date Household
0 344.297 2012-08-16 04
1 343.635 2012-08-16 04
2 344.199 2012-08-16 04
3 344.081 2012-08-16 04
4 344.183 2012-08-16 04

Raw Plugs Data:

Code
df2.head()
Reading Date Household Appliance
0 0.00000 2012-08-16 04 03
1 0.00000 2012-08-16 04 03
2 0.00000 2012-08-16 04 03
3 2.23978 2012-08-16 04 03
4 0.00000 2012-08-16 04 03

ii. Data Cleaning and Feature Engineering

Dealing with missing values:

In the plugs level data, the Readings on 25th July 2012 are missing. All the rows of the 25th July 2012 are removed from the data. Every other missing value is imputed as -1 (as per the dataset description [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)). These values are imputed as 0 to avoid any anomalies while doing analysis.

Code
df1["powerallphases"][df1["powerallphases"] == -1] = 0
df2["Reading"][df2["Reading"] == -1] = 0

# Drop all rows with Date as 2012-06-25
df2 = df2[df2["Date"] != "2012-06-25"]

Down-sizing the data:

Since for each day, there are 86,400 rows, the data should be reduced so that it is not computationallly expensive to do analysis. Here is the shape of the 2 dataframes before downsizing:

Code
print("The shape of Smart Meter data before downsizing is:", sm_df.shape)
print("The shape of Plugs level data before downsizing is:", plugs_df.shape)
The shape of Smart Meter data before downsizing is: (51840000, 3)
The shape of Plugs level data before downsizing is: (344908800, 4)

For down-sizing, the 86,400 rows are grouped into 24 groups of 3600 rows each. The mean of each Readings are taken and stored in a new dataframe. This is done for both the smart meter and plug level data.

Code
# Create a new column that contains the hour of the day

df1["Hour"] = (df1.groupby(["Household", "Date"]).cumcount() // 3600) + 1
df1 = df1.groupby(["Household", "Date", "Hour"], as_index=False).mean()

df2["Hour"] = (df2.groupby(["Household", "Date", "Appliance"]).cumcount() // 3600) + 1
df2 = df2.groupby(["Household", "Date", "Appliance", "Hour"], as_index=False).mean()

Here is the shape of the 2 dataframes after downsizing:

Code
print("The shape of Smart Meter data after downsizing is:", df1.shape)
print("The shape of Plugs level data after downsizing is:", df2.shape)
The shape of Smart Meter data after downsizing is: (14400, 4)
The shape of Plugs level data after downsizing is: (95784, 5)

Feature Engineering:

Many meaningful data can be extracted from the existing columns.

Since the Date column is already is datetime format, the Day of the week can be extracted from it. From this day of the week, we can extract the weekend and weekday information. This information can be used to see if there is any difference in the electricity consumption on weekends and weekdays.

Code
df1["Day"] = ""

for i in range(len(df1)):
    day = df1["Date"][i].weekday()

    if day < 5:
        df1["Day"][i] = "Weekday"

    else:
        df1["Day"][i] = "Weekend"

df2["Day"] = ""

for i in range(len(df2)):
    day = df2["Date"][i].weekday()

    if day < 5:
        df2["Day"][i] = "Weekday"

    else:
        df2["Day"][i] = "Weekend"

The Date column can also be used to add Holiday column. This column can be used to see if there is any difference in the electricity consumption on holidays and non-holidays.

List of Holidays in Switzerland for the period of June 2012 to January 2013 (“Switzerland Holidays,” n.d.):

Date Day Holiday Name Type of Holiday
Jun 7 Thursday Corpus Christi Common local holiday
Jun 23 Saturday Jura Independence Day Common local holiday
Jun 29 Friday St. Peter and St. Paul Common local holiday
Aug 1 Wednesday Swiss National Day National holiday
Aug 15 Wednesday Assumption of Mary Common local holiday
Sep 6 Thursday Genevese Fast Common local holiday
Sep 16 Sunday Swiss Federal Fast Common local holiday
Sep 17 Monday Federal Fast Monday Common local holiday
Sep 22 Saturday Mauritius Day Common local holiday
Sep 25 Tuesday St. Nicholas of Flüe Common local holiday
Nov 1 Thursday All Saints’ Day Common local holiday
Dec 8 Saturday Immaculate Conception Common local holiday
Dec 25 Tuesday Christmas Day Common local holiday
Dec 26 Wednesday St. Stephen’s Day Common local holiday
Dec 31 Monday Restoration Day Common local holiday
Jan 1 Tuesday New Year’s Day Common local holiday
Jan 2 Wednesday Berchtold Day Common local holiday
Jan 6 Sunday Epiphany Common local holiday
Code
# Make a list for the holidays using above markdown cell

holiday_list = [
    "2012-06-07",
    "2012-06-23",
    "2012-06-29",
    "2012-08-01",
    "2012-08-15",
    "2012-09-06",
    "2012-09-16",
    "2012-09-17",
    "2012-09-22",
    "2012-09-25",
    "2012-11-01",
    "2012-12-08",
    "2012-12-25",
    "2012-12-26",
    "2012-12-31",
    "2013-01-01",
    "2013-01-02",
    "2013-01-06",
]

# Create a new column that contains the holiday information

df1["Holiday"] = ""

for i in range(len(df1)):
    if df1["Date"][i].strftime("%Y-%m-%d") in holiday_list:
        df1["Holiday"][i] = "Holiday"

    else:
        df1["Holiday"][i] = "Non-Holiday"

df2["Holiday"] = ""

for i in range(len(df2)):
    if df2["Date"][i].strftime("%Y-%m-%d") in holiday_list:
        df2["Holiday"][i] = "Holiday"

    else:
        df2["Holiday"][i] = "Non-Holiday"

In Switzerland, the approximate season dates for 2012 and 2013 are as follows:

  • Spring 2012: March 20 - June 19
  • Summer 2012: June 20 - September 21
  • Fall 2012: September 22 - December 1
  • Winter 2012: December 2, 2012 - March 19, 2013
Code
# Define a custom function to map dates to seasons
def get_season(date):
    """
    Creates a new column that contains the season information.
    """

    spring_start = pd.Timestamp(year=date.year, month=3, day=20)
    summer_start = pd.Timestamp(year=date.year, month=6, day=20)
    fall_start = pd.Timestamp(year=date.year, month=9, day=22)
    winter_start = pd.Timestamp(year=date.year, month=12, day=2)

    if date < spring_start or date >= winter_start:
        return "Winter"
    elif date < summer_start:
        return "Spring"
    elif date < fall_start:
        return "Summer"
    else:
        return "Fall"


# Apply the function to the "Date" column and create a new "Season" column
df1["Season"] = df1["Date"].apply(get_season)
df2["Season"] = df2["Date"].apply(get_season)

Convert the Appliance types from 01, 02 and so on to their respective appliance names for better understanding the data:

Code
for i in range(len(df2)):
    if df2["Household"][i] == "04":
        appliance_types = {
            "Fridge": "01",
            "Kitchen appliances": "02",
            "Lamp": "03",
            "Stereo and laptop": "04",
            "Freezer": "05",
            "Tablet": "06",
            "Entertainment": "07",
            "Microwave": "08",
        }
        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    elif df2["Household"][i] == "05":
        appliance_types = {
            "Tablet": "01",
            "Coffee machine": "02",
            "Fountain": "03",
            "Microwave": "04",
            "Fridge": "05",
            "Entertainment": "06",
            "PC": "07",
            "Kettle": "08",
        }

        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    elif df2["Household"][i] == "06":
        appliance_types = {
            "Lamp": "01",
            "Laptop": "02",
            "Router": "03",
            "Coffee machine": "04",
            "Entertainment": "05",
            "Fridge": "06",
            "Kettle": "07",
        }

        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    else:
        pass

I noticed that some appliances are redundant and can be grouped together for better analysis.

The unique appliance types are as follows:

Code
list(df2["Appliance"].unique())
['Fridge',
 'Kitchen appliances',
 'Lamp',
 'Stereo and laptop',
 'Freezer',
 'Tablet',
 'Entertainment',
 'Microwave',
 'Coffee machine',
 'Fountain',
 'PC',
 'Kettle',
 'Laptop',
 'Router']

Appliances like:

  • Microwave, Coffee machine and Kettle can be combined into Kitchen appliances
  • Stereo and laptop, Tablet, PC, Router and Laptop can be combined into Electronics
  • Fridge and Freezer can be combined into Refrigerator
Code
for i in range(len(df2)):
    if (df2["Appliance"][i] == "Fridge") | (df2["Appliance"][i] == "Freezer"):
        df2["Appliance"][i] = "Refrigerator"

    elif (
        (df2["Appliance"][i] == "Stereo and laptop")
        | (df2["Appliance"][i] == "Laptop")
        | (df2["Appliance"][i] == "PC")
        | (df2["Appliance"][i] == "Tablet")
        | (df2["Appliance"][i] == "Router")
    ):
        df2["Appliance"][i] = "Electronics"

    elif (
        (df2["Appliance"][i] == "Microwave")
        | (df2["Appliance"][i] == "Coffee machine")
        | (df2["Appliance"][i] == "Kettle")
        | (df2["Appliance"][i] == "Kitchen appliances")
    ):
        df2["Appliance"][i] = "Kitchen"

    else:
        pass

The updated list of appliances are as follows:

Code
df2 = (
    df2.groupby(["Household", "Date", "Appliance", "Hour", "Day", "Holiday", "Season"])
    .mean()
    .reset_index()
)

list(df2["Appliance"].unique())
['Electronics', 'Entertainment', 'Kitchen', 'Lamp', 'Refrigerator', 'Fountain']

The final dataframes look like this:

Smart Meter Data:

Code
df1.to_csv("Smart_Meter_Data.csv", index=False)
df1.head()
Household Date Hour powerallphases Day Holiday Season
0 04 2012-06-27 1 471.416547 Weekday Non-Holiday Summer
1 04 2012-06-27 2 394.907202 Weekday Non-Holiday Summer
2 04 2012-06-27 3 381.151473 Weekday Non-Holiday Summer
3 04 2012-06-27 4 4495.969601 Weekday Non-Holiday Summer
4 04 2012-06-27 5 1609.837955 Weekday Non-Holiday Summer

Plugs Data:

Code
df2.to_csv("Plugs_Data.csv", index=False)
df2.head()
Household Date Appliance Hour Day Holiday Season Reading
0 04 2012-06-27 Electronics 1 Weekday Non-Holiday Summer 7.670707
1 04 2012-06-27 Electronics 2 Weekday Non-Holiday Summer 7.769636
2 04 2012-06-27 Electronics 3 Weekday Non-Holiday Summer 7.790219
3 04 2012-06-27 Electronics 4 Weekday Non-Holiday Summer 7.889956
4 04 2012-06-27 Electronics 5 Weekday Non-Holiday Summer 7.893943

IV. Exploratory Data Analysis

A. Statistical Analysis

Description of the Data:

Code
print("------------------------------------\n")
print("Smart Meter Data: \n\n", df1.describe())
print("\n------------------------------------\n")
print("Plugs Data: \n\n", df2.describe())
print("\n------------------------------------")
------------------------------------

Smart Meter Data: 

                Hour  powerallphases
count  14400.000000    14400.000000
mean      12.500000      642.279757
std        6.922427      725.089139
min        1.000000        0.000000
25%        6.750000      238.821080
50%       12.500000      452.466749
75%       18.250000      777.330296
max       24.000000     6967.335401

------------------------------------

Plugs Data: 

                Hour       Reading
count  65952.000000  65952.000000
mean      12.500000     21.684298
std        6.922239     34.054088
min        1.000000      0.000000
25%        6.750000      2.120461
50%       12.500000      9.215207
75%       18.250000     20.061170
max       24.000000    793.706827

------------------------------------

B. Visual Analysis

i. Time Series Analysis

The time series analysis is done on the smart meter data. It is done to see what are the trends of the total electricity consumption over time of all 3 households.

Code
# Create a new dataframe
df_plot1 = df1.groupby(["Date", "Household"])["powerallphases"].mean().reset_index()

# Create a figure object
fig = go.Figure()
for household in df_plot1["Household"].unique():
    # Add a trace for each household
    fig.add_trace(
        go.Scatter(
            x=df_plot1[df_plot1["Household"] == household]["Date"],
            y=df_plot1[df_plot1["Household"] == household]["powerallphases"],
            mode="lines",
            name=household,
        )
    )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)))
fig.update_layout(
    title=dict(text="Total Power Consumption over Time", font=dict(size=20)),
    legend_title_text="Household",
    legend=dict(x=0.905, y=0.99),
    template="ggplot2",
)

# Display the plot
fig.show()

Observations:

  • The total electricity consumption increases over time (from July to January) for all the 3 households.
  • If we look at the individual households, we can see that the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6. The total electricity consumption of household 6 is the lowest. This puts forward many questions like why is the total electricity consumption of household 3 the lowest? Is it because the household is not using any appliances? Is it because the household is using appliances that consume less electricity? Is it because there are less people in the household? and many more.
  • It is also clear that Household 6 has almost constant electricity consumed during the period of 10 November 2012 to 4 January 2013.
  • For Household 4, on 13 December 2012, the total electricity consumption is very high. These can be explored by doing further analysis.

Further Analysis for Households 4 and 6:

It is evident from the above plot that there are some anomalies in the data for households 4 and 6. To further analyze these anomalies, the Plugs level data for these 2 households is seen separately.

Household 4:

On 13 December 2012, the total electricity consumption is very high. Plotting the electricity consumption for each appliance for that day might give some insight into the reason for the high electricity consumption.

Code
# Create a new dataframe
df_plot2 = df2[["Appliance", "Hour", "Reading"]][
    (df2["Household"] == "04") & (df2["Date"] == "2012-12-13")
]

# Create a figure object
fig = go.Figure()

# Define a list of colors for each appliance
colors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]

# Add a trace for each hour
for hour in df_plot2["Hour"].unique():
    fig.add_trace(
        go.Bar(
            x=df_plot2[df_plot2["Hour"] == hour]["Appliance"],
            y=df_plot2[df_plot2["Hour"] == hour]["Reading"],
            name=str(hour),
            visible=False,
            marker_color=colors,
        )
    )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Power Consumption of Each Appliance (for Household 4) for Hour 1 <br><sup>Note: The power consumption is only for 13 December 2012</sup>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

# Create and add slider
steps = []
for i in range(len(fig.data)):
    step = dict(
        method="update",
        args=[
            {"visible": [False] * len(fig.data)},
            {
                "title": "Power Consumption of Each Appliance (for Household 4) for Hour "
                + str(i + 1) + "<br><sup>Note: The power consumption is only for 13 December 2012</sup>"
            },
        ],  # layout attribute
        label=i + 1,
    )
    step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [
    dict(active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=steps)
]

# Update the layout
fig.update_layout(sliders=sliders)

# Display the plot
fig.show()

Observations:

  • The electricity consumption for Refrigerator is high throughout the day.
  • Starting from 2:00 PM (Hour 14), it can be seen that there is a high usage of Entertainment appliances which include TV and Stereo.
  • At 7:00 PM (Hour 19), there is a unusually high spike in the electricity consumption for Kitchen appliances.

Household 6:

From 10 November 2012 to 4 January 2013, the total electricity consumption is almost constant. Plotting the electricity consumption for each appliance for that period might give some insight into the reason for the constant electricity consumption.

Code
# Create a new dataframe
df_plot3 = df2[["Date", "Appliance", "Hour", "Reading"]][
    (df2["Household"] == "06")
    & (df2["Date"] > "2012-11-09")
    & (df2["Date"] < "2013-01-05")
]
df_plot3 = df_plot3.groupby(["Date", "Appliance"])["Reading"].mean().reset_index()

# Create a figure object
fig = go.Figure()

# Define a list of colors for each appliance
colors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]
i = 0

# Add a trace for each appliance
for appliance in df_plot3["Appliance"].unique():
    fig.add_trace(
        go.Bar(
            x=df_plot3[df_plot3["Appliance"] == appliance]["Date"],
            y=df_plot3[df_plot3["Appliance"] == appliance]["Reading"],
            name=appliance,
            visible=False,
            marker_color=colors[i],
        )
    )
    
    i = i+1
    
# Update the layout of the plot
fig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Average Power Consumption of Electronics (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

button_list = []
for i in range(len(fig.data)):
    
    button_dict = dict(
        label = fig.data[i].name,
        method="update",
        args=[{"visible": [False] * len(fig.data)},
                {"title": "Average Power Consumption of " + fig.data[i].name + " (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>"}]  # layout attribute
    )
    
    button_dict["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    
    button_list.append(button_dict)
    
fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=button_list,
            direction="down",
            pad={"b": 20, "t": 20},
            showactive=True,
            x=0.86,
            xanchor="left",
            y=1.21,
            yanchor="top"
        ),
    ]
)

# Display the plot
fig.show()

Observations:

  • Every appliance except Entertainment appliances has constant and very low consumption throughout the period.
  • The Entertainment appliances have a constant and low consumption but comparatively higher than the other appliances.
  • Another interesting observation is that since the Lamp is triggered by the Motion Sensor, it might be possible that there is some defect in the Lamp since all other observations conclude to the family not being at home.

ii. Appliance Analysis

The appliance analysis is done on the plugs data. It is done to see what are the trends of the electricity consumption of each appliance over time for each household. This also helps in identifying some living patterns of the families living in the households.

Living trends analysis:

To identify the living trends of the families living in the households, the electricity consumption of each appliance is analyzed for each household.

Code
# Create a new dataframe
df_plot4 = df2.groupby(["Hour", "Appliance", "Household"]).mean().reset_index()

# Create a figure object
fig = go.Figure()

# Define lists
colors = ["#ff6664", "#ffcc68", "#fee664", "#a86ccd", "#67e068", "#7591bd"]
households = df_plot4["Household"].unique()
hours = df_plot4["Hour"].unique()

# Add a trace for each appliance
for household in households:
    
    for hour in hours:
        
        fig.add_trace(
            go.Bar(
                x=df_plot4[
                    (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour)
                ]["Appliance"],
                y=df_plot4[
                    (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour)
                ]["Reading"],
                name=appliance + " - Household " + household,
                visible=False,
                marker_color=colors,
            )
        )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Average Power Consumption of Appliances for at Hour 1 <br><br>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

# Create slider steps for each household
slider_steps = []
for i, household in enumerate(households):
    
    step = []

    for j, hour in enumerate(hours):
        
        visible = [False] * len(fig.data)
        visible[i * len(hours) + j] = True
        step.append({"label": j+1,
                     "method": "update", 
                     "args": [{"visible": visible},
                              {"title": "Average Power Consumption of Appliances at Hour " + str(hour) + "<br><br>"}
                              ]
                     }
                    )

    slider_steps.append(step)

# Create a slider for each household
sliders = []
for i, household in enumerate(households):
    
    slider = [
        dict(
            active=0,
            currentvalue={"prefix": "Hour: "},
            pad={"t": 30},
            steps=slider_steps[i],
        )
    ]
    sliders.append(slider)
    
# Add the slider for the first household to the figure layout
fig.update_layout(sliders=sliders[0])

# Create a list of visible traces for each household
visible_list = []
for i in range(0, len(fig.data), len(hours)):
    visible = [False] * len(fig.data)
    visible[i] = True
    visible_list.append(visible)

# Create a button for each continent
buttons = []
for i, household in enumerate(households):
    button = dict(
        label="Household " + str(int(household)),
        method="update",
        args=[{"visible": visible_list[i]}, {"sliders": sliders[i]}],
    )
    buttons.append(button)

# Add the buttons to the figure layout
fig.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            active=0,
            buttons=buttons,
            direction="right",
            pad={"r": 10, "t": 15},
            x=0.355,
            xanchor="left",
            y=1.2,
            yanchor="top",
        ),
    ]
)

# Show the figure
fig.show()

Observations:

  • Household 4: There is high usage of Refrigerator throughout the day. It can also be seen there are spikes in the usage of Kitchen appliances at 6:00 AM, 12:00 PM and 7:00 PM. Adding on to that, Entertainment appliances are used just after the Kitchen appliances. There is also no use of appliances from 12:00 AM to 6:00 AM.
  • Household 5: As compared to Household 4, the use of Refrigerator is low but still higher that all their other appliances. There is less usage of Kitchen appliances as compared to Household 4. In contrast to household 4, this family watches more TV starting from 6:00 PM since there is a lot of usage for Entertainment appliances. There is also no use of appliances from 11:00 PM to 7:00 AM.
  • Household 6: All the appliances used have a very low average power consumption throughout the day. Hence it is difficult to make any conclusions about the living patterns of the family living in this household. Hence I will not be doing any further analysis on this household.

Consumption Analysis based on Day, Holiday and Season:

To analyze the usage of appliances on weekdays, weekends, different seasons or holidays, the data is grouped by the respective categories and the average power consumption of each appliance is calculated. The data is then plotted to see the trends.

Code
# Create a new dataframe
df2["Month"] = df2["Date"].dt.month
df2["Month"] = df2["Month"].apply(
    lambda x: datetime.datetime.strptime(str(x), "%m").strftime("%B")
)
df_plot5 = df2.groupby(["Month", "Appliance", "Day"])["Reading"].mean().reset_index()
df_plot6 = (
    df2.groupby(["Month", "Appliance", "Holiday"])["Reading"].mean().reset_index()
)
df_plot7 = df2.groupby(["Month", "Appliance", "Season"])["Reading"].mean().reset_index()

# Create selection attributes
bar = alt.selection(type="single", encodings=["x"])
color = alt.condition(bar, alt.Color("Appliance:N"), alt.value("lightgray"))

# Chart 1
chart1 = (
    alt.Chart(df_plot5)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Day:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Day:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(28),
    )
    .add_selection(bar)
)

# Chart 2
chart2 = (
    alt.Chart(df_plot6)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Holiday:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Holiday:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(28),
    )
    .add_selection(bar)
)

# Chart 3
chart3 = (
    alt.Chart(df_plot7)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Season:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Season:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(40),
    )
    .add_selection(bar)
)

# Combine the charts
chart = alt.hconcat(chart1, chart2)
chart = alt.vconcat(chart, chart3, title="Power Consumption of each Appliance")
chart = chart.configure_title(fontSize=20, anchor="middle").configure_view(
    fill="#ededed"
)

# Show the chart
chart

Observations:

  • For Entertainment appliances, there is a higher consumption on weekends as compared to weekdays. This is also the case for Holidays and Non-Holidays. During summer, the consumption of electricity by entertainment appliances is far less than the other seasons.
  • For Refrigerator, there is a higher consumption on weekends as compared to weekdays. The use of refrigerator is more in Summer and Fall as compared to Winter.
  • Moving to Kitchen appliances, there is a higher consumption on weekdays as compared to weekends. This is not the case for Holidays. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer.
  • Another interesting appliance is Fountain. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain.
  • For Lamp, there is a higher consumption during winter as compared to summer and fall.

V. Results and Inferences

It is observed that the total electricity consumption increases over time for all three households. However, the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6, with household 6 having the lowest consumption.

Further analysis of the plugs level data for households 4 and 6 reveals some anomalies in the data. On 13 December 2012, household 4 had a very high electricity consumption, and plotting the electricity consumption for each appliance for that day revealed that the high consumption was due to high usage of entertainment and kitchen appliances, possibly because the family had guests over for dinner. On the other hand, household 6 had almost constant electricity consumption from 10 November 2012 to 4 January 2013. The analysis of the electricity consumption for each appliance during this period reveals that every appliance except entertainment appliances had constant and very low consumption, which suggests that the family might have been away from home during this period.

The appliance analysis of the plugs data provides some interesting insights into the living patterns of the families living in the households. Household 4 had high usage of the refrigerator throughout the day, with spikes in the usage of kitchen appliances at specific times, suggesting that the family makes breakfast, lunch, and dinner at specific times throughout the day. The family also watches TV after using the kitchen appliances which indicated that they watch TV after cooking food, and there is no usage of appliances from midnight to 6:00 am, which suggests that the family sleeps during this time. In contrast, Household 5 had lower usage of the refrigerator but still higher than all other appliances, and a lower usage of Kitchen Appliances indicating that the family is busy and does not cook many meals at home. This family also watches more TV as compared to the family of Household 4, and there is no usage of appliances from 11:00 pm to 7:00 am, suggesting that the family sleeps during this time.

The consumption analysis based on day, holiday, and season reveals that the usage of Entertainment appliances is higher on weekends and holidays which is consistent with the observation that people tend to watch more TV during their leisure time. Moving to Kitchen appliances, there is a higher consumption on weekdays as compared to weekends. During weekdays, families go for work and usually get up on time, make food and carry tiffins and stay on the same routine throughout. But during weekends, they do not have to get up and work and can enjoy their day offs by going out for food and spending time with family more. This is not the case for Holidays. One would expect that on holidays, people will go out for food and hence the use of kitchen appliances will reduce. But it can also be assumed that, on holidays families have guests over or cook traditional dishes to celebrate the holiday and hence the use of kitchen appliances increases. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer. For Refrigerator, there is a higher consumption on weekends as compared to weekdays. This is because people tend to go out more on weekends and hence the refrigerator is not used as much. The use of refrigerator is more in Summer and Fall as compared to Winter. This is fairly obvious, that people do not need to store food inside the refrigerator or eat colder food since the weather is already cold. Another interesting appliance is Fountain. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain since it is cold outside and the water will freeze and hence the fountain will not be used. Lastly for Lamp, there is a higher consumption during winter as compared to summer and fall. This is because people tend to stay indoors more during winter and hence the lamp is used more. This is also because the duration of daylight is less during winter and hence the lamp is used more to provide light.

Overall, the analysis provides some valuable insights into the electricity consumption patterns of the households and their living patterns, which can be used to develop energy-saving strategies and improve energy efficiency.

References

Beckel, Christian, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini. 2014. “The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms.” Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014).
Kleiminger, Wilhelm, Christian Beckel, and Silvia Santini. 2015. “Household Occupancy Monitoring Using Electricity Meters.” Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015).
“Switzerland Holidays.” n.d.
Source Code
---
title: <b>Electricity Consumption Analysis</b>
author: Ramdayal Rewaria
format:
  html:
    self-contained: true
    embed-resources: true
    page-layout: full
    code-fold: true
    toc: true
    toc-depth: 6
    toc-expand: 4
    toc-location: left
    theme: lumen
execute:
  echo: true
bibliography: bibliography.bib
jupyter: python3
---

# I. Data

The ECO (Electricity Consumption and Occupancy) dataset is a valuable resource for non-intrusive load monitoring and occupancy detection research. The dataset was collected in six Swiss households over a period of eight months, providing a comprehensive view of energy consumption patterns and occupancy information. The ECO dataset is open-source and provided under the Creative Commons License CC BY 4.0, which means that researchers can freely use the data for non-commercial purposes, as long as they give credit to the authors. [@Paper1][@Paper2]

The ECO dataset contains three types of data: 1 Hz aggregate consumption data, 1 Hz plug-level data, and occupancy information. The 1 Hz aggregate consumption data contains data on current, voltage, and phase shift for each of the three phases in the household. This data was collected from the smart meter in each household and provides a high-level view of the overall energy consumption patterns in the home.

The 1 Hz plug-level data was collected from selected appliances in each household, providing a more detailed view of energy consumption patterns for specific devices. This data can be used to identify opportunities for energy savings and to develop more accurate load monitoring algorithms.

The occupancy information was collected through a tablet computer and a passive infrared sensor in some of the households. The tablet computer was used for manual labeling of occupancy information, while the passive infrared sensor provided automatic detection of occupancy information. This information can be used to develop occupancy detection algorithms that can be used to optimize energy consumption in the home.

For this project, I will focus on households 4, 5, and 6 and explore the dataset using Python and various visualization libraries.

# II. Data Science Questions

1. What is the overall trend in electricity consumption for households 4, 5, and 6 over the 8-month period?

2. Which appliances consume the most electricity in each household, and how does this vary over time?

3. Are there any anomalies or outliers in the data that may require further investigation?

4. Is there a difference in electricity consumption patterns of appliances for weekdays and weekends? 

5. Is there a difference in electricity consumption patterns of appliances for holidays and non-holidays?

6. Are the electricity consumption patterns of appliances related to seasons?

# III. Data Preparation

The dataset is provided in the form of folders for each household (smart meter and plug-level data). [@Paper1][@Paper2]

For the smart meter data, each CSV file contains aggregate consumption data of a single day and is named accordingly (e.g., "2012-08-31.csv"). Each file contains 86,400 rows (i.e., one row for each second). Rows with missing measurements are denoted as "-1". The coverage specifies the proportion of "existing" values, i.e., values that are > -1. The rows consist of the following comma separated values:

* powerallphases: Sum of real power over all phases
* powerl1: Real power phase 1
* powerl2: Real power phase 2
* powerl3: Real power phase 3
* currentneutral: Neutral current
* currentl1: Current phase 1
* currentl2: Current phase 2
* currentl3: Current phase 3
* voltagel1: Voltage phase 1
* voltagel2: Voltage phase 2
* voltagel3: Voltage phase 3
* phaseanglevoltagel2l1: Phase shift between voltage on phase 2 and 1
* phaseanglevoltagel3l1: Phase shift between voltage on phase 3 and 1
* phaseanglecurrentvoltagel1: Phase shift between current/voltage on phase 1
* phaseanglecurrentvoltagel2: Phase shift between current/voltage on phase 2 
* phaseanglecurrentvoltagel3: Phase shift between current/voltage on phase 3

Like the smart meter data, the plug data is also provided on a daily based (i.e., 86,400 measurements per day) in CSV format. In contrast to the smart meter data, it contains only one value:

* consumption: Real power measured by the plug

Each file contains plug measurements for the whole household. The (daily) data for each plug is stored in subfolders named after the plug ID (e.g., "05") described above.


## A. Import Libraries

```{python}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import altair as alt

pio.renderers.default = "plotly_mimetype+notebook_connected"

import glob
import datetime
import warnings

warnings.filterwarnings("ignore")
```

## B. Data Processing

The raw data needs to be processed and cleaned before it can be used for analysis.


### i. Extracting Data and Creating Dataframe

The data is extracted from the CSV files and stored in a Pandas dataframe. 4 functions are used to extract the data from the CSV files and create the dataframe:

* **extract_plugs_data**: Extracts the data from the CSV files of plug-level data and stores it in a Pandas dataframe.
* **extract_sm_data**: Extracts the data from the CSV files of smart meter data and stores it in a Pandas dataframe.
* **create_plugs_df**: Combines the data from the CSV files of plug-level data into a single dataframe.
* **create_sm_df**: Combines the data from the CSV files of smart meter data into a single dataframe.

```{python}
# Define a function to extract the data from a CSV file and format it into a dataframe


def extract_plugs_data(csv_file_path, household):
    """
    Reads a CSV file located at the given file path and extracts plug-level data from it.
    It then adds additional columns to the DataFrame, including the date of the data,
    the household number, and the appliance type.

    Returns:
        _type_: pandas DataFrame
    """

    df = pd.read_csv(csv_file_path, header=None)
    date_str = csv_file_path.split("/")[-1].replace(".csv", "")
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d")
    subfolder = csv_file_path.split("/")[-2]
    df["Date"] = date
    df["Household"] = household
    df["Appliance"] = subfolder
    df.columns = ["Reading", "Date", "Household", "Appliance"]

    return df


def extract_sm_data(csv_file_path, household):
    """
    Reads a CSV file located at the given file path and extracts smart meter data from it.
    It then adds additional columns to the DataFrame, including the date of the data and
    the household number.

    Returns:
        _type_: pandas DataFrame
    """

    df = pd.read_csv(csv_file_path, header=None)
    date_str = csv_file_path.split("/")[-1].replace(".csv", "")
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d")
    df["Date"] = date
    df["Household"] = household
    df.columns = [
        "powerallphases",
        "powerl1",
        "powerl2",
        "powerl3",
        "currentneutral",
        "currentl1",
        "currentl2",
        "currentl3",
        "voltagel1",
        "voltagel2",
        "voltagel3",
        "phaseanglevoltagel2l1",
        "phaseanglevoltagel3l1",
        "phaseanglecurrentvoltagel1",
        "phaseanglecurrentvoltagel2",
        "phaseanglecurrentvoltagel3",
        "Date",
        "Household",
    ]
    # Drop the columns that are not needed
    df.drop(
        [
            "powerl1",
            "powerl2",
            "powerl3",
            "currentneutral",
            "currentl1",
            "currentl2",
            "currentl3",
            "voltagel1",
            "voltagel2",
            "voltagel3",
            "phaseanglevoltagel2l1",
            "phaseanglevoltagel3l1",
            "phaseanglecurrentvoltagel1",
            "phaseanglecurrentvoltagel2",
            "phaseanglecurrentvoltagel3",
        ],
        axis=1,
        inplace=True,
    )
    return df


def create_plugs_df(folder_list, household_list):
    """
    This function takes in a list of folder paths and a corresponding list of
    household IDs as input. It then loops over each folder path, extracts data from CSV
    files in the subfolders, and creates a list of dataframes. These dataframes are then
    combined into a single dataframe using the pandas concat() function and returned as output.

    Returns:
        _type_: pandas DataFrame
    """

    # Create a list to store the extracted dataframes
    df_list = []

    # Loop over each folder path and extract the data from the CSV files
    for i in range(len(folder_list)):
        folder_path = folder_list[i]
        household = household_list[i]
        subfolder_paths = glob.glob(folder_path + "/*")

        for subfolder_path in subfolder_paths:
            csv_files = glob.glob(subfolder_path + "/*.csv")

            for csv_file in csv_files:
                df = extract_plugs_data(csv_file, household)
                df_list.append(df)

    # Combine the extracted dataframes into a single dataframe
    plugs_df = pd.concat(df_list, ignore_index=True)

    return plugs_df


def create_sm_df(folder_list, household_list):
    """
    This function takes in a list of folder paths and a corresponding list of
    household IDs as input. It then loops over each folder path, extracts data from CSV
    files, and creates a list of dataframes. These dataframes are then
    combined into a single dataframe using the pandas concat() function and returned as output.

    Returns:
        _type_: pandas DataFrame
    """

    # Create a list to store the extracted dataframes
    df_list = []

    # Loop over each folder path and extract the data from the CSV files
    for i in range(len(folder_list)):
        folder_path = folder_list[i]
        household = household_list[i]
        csv_files = glob.glob(folder_path + "/*.csv")

        for csv_file in csv_files:
            df = extract_sm_data(csv_file, household)
            df_list.append(df)

    # Combine the extracted dataframes into a single dataframe
    sm_df = pd.concat(df_list, ignore_index=True)

    return sm_df


# Set the paths to the folders containing the data
folder_paths_sm = ["eco/04_sm", "eco/05_sm", "eco/06_sm"]
folder_paths_plugs = ["eco/04_plugs", "eco/05_plugs", "eco/06_plugs"]
household_nums = ["04", "05", "06"]

# Store the data in a dataframe
sm_df = create_sm_df(folder_paths_sm, household_nums)
plugs_df = create_plugs_df(folder_paths_plugs, household_nums)

# Store it in new dataframes to preserve the original data
df1 = sm_df.copy()
df2 = plugs_df.copy()
```

Raw Smart Meter data:

```{python}
df1.head()
```

Raw Plugs Data:

```{python}
df2.head()
```

### ii. Data Cleaning and Feature Engineering

_Dealing with missing values:_

In the plugs level data, the Readings on 25th July 2012 are missing. All the rows of the 25th July 2012 are removed from the data. Every other missing value is imputed as -1 (as per the dataset description [@Paper1][@Paper2]). These values are imputed as 0 to avoid any anomalies while doing analysis.

```{python}
df1["powerallphases"][df1["powerallphases"] == -1] = 0
df2["Reading"][df2["Reading"] == -1] = 0

# Drop all rows with Date as 2012-06-25
df2 = df2[df2["Date"] != "2012-06-25"]
```

_Down-sizing the data:_

Since for each day, there are 86,400 rows, the data should be reduced so that it is not computationallly expensive to do analysis. Here is the shape of the 2 dataframes before downsizing:

```{python}
print("The shape of Smart Meter data before downsizing is:", sm_df.shape)
print("The shape of Plugs level data before downsizing is:", plugs_df.shape)
```

For down-sizing, the 86,400 rows are grouped into 24 groups of 3600 rows each. The mean of each Readings are taken and stored in a new dataframe. This is done for both the smart meter and plug level data.

```{python}
# Create a new column that contains the hour of the day

df1["Hour"] = (df1.groupby(["Household", "Date"]).cumcount() // 3600) + 1
df1 = df1.groupby(["Household", "Date", "Hour"], as_index=False).mean()

df2["Hour"] = (df2.groupby(["Household", "Date", "Appliance"]).cumcount() // 3600) + 1
df2 = df2.groupby(["Household", "Date", "Appliance", "Hour"], as_index=False).mean()
```

Here is the shape of the 2 dataframes after downsizing:

```{python}
print("The shape of Smart Meter data after downsizing is:", df1.shape)
print("The shape of Plugs level data after downsizing is:", df2.shape)
```

_Feature Engineering:_

Many meaningful data can be extracted from the existing columns.

Since the Date column is already is datetime format, the Day of the week can be extracted from it. From this day of the week, we can extract the weekend and weekday information. This information can be used to see if there is any difference in the electricity consumption on weekends and weekdays.

```{python}
df1["Day"] = ""

for i in range(len(df1)):
    day = df1["Date"][i].weekday()

    if day < 5:
        df1["Day"][i] = "Weekday"

    else:
        df1["Day"][i] = "Weekend"

df2["Day"] = ""

for i in range(len(df2)):
    day = df2["Date"][i].weekday()

    if day < 5:
        df2["Day"][i] = "Weekday"

    else:
        df2["Day"][i] = "Weekend"
```

The Date column can also be used to add Holiday column. This column can be used to see if there is any difference in the electricity consumption on holidays and non-holidays.

_List of Holidays in Switzerland for the period of June 2012 to January 2013 [@holiday]:_

| Date        | Day       | Holiday Name                 | Type of Holiday      |
|-------------|-----------|------------------------------|----------------------|
| Jun 7       | Thursday | Corpus Christi               | Common local holiday |
| Jun 23      | Saturday | Jura Independence Day        | Common local holiday |
| Jun 29      | Friday   | St. Peter and St. Paul       | Common local holiday |
| Aug 1       | Wednesday| Swiss National Day           | National holiday     |
| Aug 15      | Wednesday| Assumption of Mary           | Common local holiday |
| Sep 6       | Thursday | Genevese Fast                | Common local holiday |
| Sep 16      | Sunday   | Swiss Federal Fast           | Common local holiday |
| Sep 17      | Monday   | Federal Fast Monday          | Common local holiday |
| Sep 22      | Saturday | Mauritius Day                | Common local holiday |
| Sep 25      | Tuesday  | St. Nicholas of Flüe         | Common local holiday |
| Nov 1       | Thursday | All Saints' Day              | Common local holiday |
| Dec 8       | Saturday | Immaculate Conception        | Common local holiday |
| Dec 25      | Tuesday  | Christmas Day                | Common local holiday |
| Dec 26      | Wednesday| St. Stephen's Day            | Common local holiday |
| Dec 31      | Monday   | Restoration Day              | Common local holiday |
| Jan 1       | Tuesday  | New Year's Day               | Common local holiday |
| Jan 2       | Wednesday| Berchtold Day                | Common local holiday |
| Jan 6       | Sunday   | Epiphany                     | Common local holiday |

```{python}
# Make a list for the holidays using above markdown cell

holiday_list = [
    "2012-06-07",
    "2012-06-23",
    "2012-06-29",
    "2012-08-01",
    "2012-08-15",
    "2012-09-06",
    "2012-09-16",
    "2012-09-17",
    "2012-09-22",
    "2012-09-25",
    "2012-11-01",
    "2012-12-08",
    "2012-12-25",
    "2012-12-26",
    "2012-12-31",
    "2013-01-01",
    "2013-01-02",
    "2013-01-06",
]

# Create a new column that contains the holiday information

df1["Holiday"] = ""

for i in range(len(df1)):
    if df1["Date"][i].strftime("%Y-%m-%d") in holiday_list:
        df1["Holiday"][i] = "Holiday"

    else:
        df1["Holiday"][i] = "Non-Holiday"

df2["Holiday"] = ""

for i in range(len(df2)):
    if df2["Date"][i].strftime("%Y-%m-%d") in holiday_list:
        df2["Holiday"][i] = "Holiday"

    else:
        df2["Holiday"][i] = "Non-Holiday"
```

In Switzerland, the approximate season dates for 2012 and 2013 are as follows:

* Spring 2012: March 20 - June 19
* Summer 2012: June 20 - September 21
* Fall 2012: September 22 - December 1
* Winter 2012: December 2, 2012 - March 19, 2013

```{python}
# Define a custom function to map dates to seasons
def get_season(date):
    """
    Creates a new column that contains the season information.
    """

    spring_start = pd.Timestamp(year=date.year, month=3, day=20)
    summer_start = pd.Timestamp(year=date.year, month=6, day=20)
    fall_start = pd.Timestamp(year=date.year, month=9, day=22)
    winter_start = pd.Timestamp(year=date.year, month=12, day=2)

    if date < spring_start or date >= winter_start:
        return "Winter"
    elif date < summer_start:
        return "Spring"
    elif date < fall_start:
        return "Summer"
    else:
        return "Fall"


# Apply the function to the "Date" column and create a new "Season" column
df1["Season"] = df1["Date"].apply(get_season)
df2["Season"] = df2["Date"].apply(get_season)
```

_Convert the Appliance types from 01, 02 and so on to their respective appliance names for better understanding the data:_

```{python}
for i in range(len(df2)):
    if df2["Household"][i] == "04":
        appliance_types = {
            "Fridge": "01",
            "Kitchen appliances": "02",
            "Lamp": "03",
            "Stereo and laptop": "04",
            "Freezer": "05",
            "Tablet": "06",
            "Entertainment": "07",
            "Microwave": "08",
        }
        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    elif df2["Household"][i] == "05":
        appliance_types = {
            "Tablet": "01",
            "Coffee machine": "02",
            "Fountain": "03",
            "Microwave": "04",
            "Fridge": "05",
            "Entertainment": "06",
            "PC": "07",
            "Kettle": "08",
        }

        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    elif df2["Household"][i] == "06":
        appliance_types = {
            "Lamp": "01",
            "Laptop": "02",
            "Router": "03",
            "Coffee machine": "04",
            "Entertainment": "05",
            "Fridge": "06",
            "Kettle": "07",
        }

        df2["Appliance"][i] = list(appliance_types.keys())[
            list(appliance_types.values()).index(df2["Appliance"][i])
        ]

    else:
        pass
```

_I noticed that some appliances are redundant and can be grouped together for better analysis._ 

The unique appliance types are as follows:

```{python}
list(df2["Appliance"].unique())
```

Appliances like:

* `Microwave`, `Coffee machine` and `Kettle` can be combined into `Kitchen appliances`
* `Stereo and laptop`, `Tablet`, `PC`, `Router` and `Laptop` can be combined into `Electronics`
* `Fridge` and `Freezer` can be combined into `Refrigerator`

```{python}
for i in range(len(df2)):
    if (df2["Appliance"][i] == "Fridge") | (df2["Appliance"][i] == "Freezer"):
        df2["Appliance"][i] = "Refrigerator"

    elif (
        (df2["Appliance"][i] == "Stereo and laptop")
        | (df2["Appliance"][i] == "Laptop")
        | (df2["Appliance"][i] == "PC")
        | (df2["Appliance"][i] == "Tablet")
        | (df2["Appliance"][i] == "Router")
    ):
        df2["Appliance"][i] = "Electronics"

    elif (
        (df2["Appliance"][i] == "Microwave")
        | (df2["Appliance"][i] == "Coffee machine")
        | (df2["Appliance"][i] == "Kettle")
        | (df2["Appliance"][i] == "Kitchen appliances")
    ):
        df2["Appliance"][i] = "Kitchen"

    else:
        pass
```

The updated list of appliances are as follows:

```{python}
df2 = (
    df2.groupby(["Household", "Date", "Appliance", "Hour", "Day", "Holiday", "Season"])
    .mean()
    .reset_index()
)

list(df2["Appliance"].unique())
```

**_The final dataframes look like this:_**

Smart Meter Data:

```{python}
df1.to_csv("Smart_Meter_Data.csv", index=False)
df1.head()
```

Plugs Data:

```{python}
df2.to_csv("Plugs_Data.csv", index=False)
df2.head()
```

# IV. Exploratory Data Analysis

## A. Statistical Analysis

**Description of the Data:**

```{python}
print("------------------------------------\n")
print("Smart Meter Data: \n\n", df1.describe())
print("\n------------------------------------\n")
print("Plugs Data: \n\n", df2.describe())
print("\n------------------------------------")
```

## B. Visual Analysis

### i. Time Series Analysis

The time series analysis is done on the smart meter data. It is done to see what are the trends of the total electricity consumption over time of all 3 households.

```{python}
# Create a new dataframe
df_plot1 = df1.groupby(["Date", "Household"])["powerallphases"].mean().reset_index()

# Create a figure object
fig = go.Figure()
for household in df_plot1["Household"].unique():
    # Add a trace for each household
    fig.add_trace(
        go.Scatter(
            x=df_plot1[df_plot1["Household"] == household]["Date"],
            y=df_plot1[df_plot1["Household"] == household]["powerallphases"],
            mode="lines",
            name=household,
        )
    )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)))
fig.update_layout(
    title=dict(text="Total Power Consumption over Time", font=dict(size=20)),
    legend_title_text="Household",
    legend=dict(x=0.905, y=0.99),
    template="ggplot2",
)

# Display the plot
fig.show()
```

_Observations:_

* The total electricity consumption increases over time (from July to January) for all the 3 households. 
* If we look at the individual households, we can see that the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6. The total electricity consumption of household 6 is the lowest. This puts forward many questions like why is the total electricity consumption of household 3 the lowest? Is it because the household is not using any appliances? Is it because the household is using appliances that consume less electricity? Is it because there are less people in the household? and many more.
* It is also clear that Household 6 has almost constant electricity consumed during the period of 10 November 2012 to 4 January 2013.
* For Household 4, on 13 December 2012, the total electricity consumption is very high. These can be explored by doing further analysis.

**Further Analysis for Households 4 and 6:**

It is evident from the above plot that there are some anomalies in the data for households 4 and 6. To further analyze these anomalies, the Plugs level data for these 2 households is seen separately.

_Household 4:_ 

On 13 December 2012, the total electricity consumption is very high. Plotting the electricity consumption for each appliance for that day might give some insight into the reason for the high electricity consumption.

```{python}
# Create a new dataframe
df_plot2 = df2[["Appliance", "Hour", "Reading"]][
    (df2["Household"] == "04") & (df2["Date"] == "2012-12-13")
]

# Create a figure object
fig = go.Figure()

# Define a list of colors for each appliance
colors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]

# Add a trace for each hour
for hour in df_plot2["Hour"].unique():
    fig.add_trace(
        go.Bar(
            x=df_plot2[df_plot2["Hour"] == hour]["Appliance"],
            y=df_plot2[df_plot2["Hour"] == hour]["Reading"],
            name=str(hour),
            visible=False,
            marker_color=colors,
        )
    )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Power Consumption of Each Appliance (for Household 4) for Hour 1 <br><sup>Note: The power consumption is only for 13 December 2012</sup>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

# Create and add slider
steps = []
for i in range(len(fig.data)):
    step = dict(
        method="update",
        args=[
            {"visible": [False] * len(fig.data)},
            {
                "title": "Power Consumption of Each Appliance (for Household 4) for Hour "
                + str(i + 1) + "<br><sup>Note: The power consumption is only for 13 December 2012</sup>"
            },
        ],  # layout attribute
        label=i + 1,
    )
    step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [
    dict(active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=steps)
]

# Update the layout
fig.update_layout(sliders=sliders)

# Display the plot
fig.show()
```

_Observations:_

* The electricity consumption for `Refrigerator` is high throughout the day. 
* Starting from 2:00 PM (Hour 14), it can be seen that there is a high usage of `Entertainment` appliances which include _TV and Stereo_.
* At 7:00 PM (Hour 19), there is a unusually high spike in the electricity consumption for `Kitchen appliances`.

_Household 6:_

From 10 November 2012 to 4 January 2013, the total electricity consumption is almost constant. Plotting the electricity consumption for each appliance for that period might give some insight into the reason for the constant electricity consumption.

```{python}
# Create a new dataframe
df_plot3 = df2[["Date", "Appliance", "Hour", "Reading"]][
    (df2["Household"] == "06")
    & (df2["Date"] > "2012-11-09")
    & (df2["Date"] < "2013-01-05")
]
df_plot3 = df_plot3.groupby(["Date", "Appliance"])["Reading"].mean().reset_index()

# Create a figure object
fig = go.Figure()

# Define a list of colors for each appliance
colors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]
i = 0

# Add a trace for each appliance
for appliance in df_plot3["Appliance"].unique():
    fig.add_trace(
        go.Bar(
            x=df_plot3[df_plot3["Appliance"] == appliance]["Date"],
            y=df_plot3[df_plot3["Appliance"] == appliance]["Reading"],
            name=appliance,
            visible=False,
            marker_color=colors[i],
        )
    )
    
    i = i+1
    
# Update the layout of the plot
fig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Average Power Consumption of Electronics (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

button_list = []
for i in range(len(fig.data)):
    
    button_dict = dict(
        label = fig.data[i].name,
        method="update",
        args=[{"visible": [False] * len(fig.data)},
                {"title": "Average Power Consumption of " + fig.data[i].name + " (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>"}]  # layout attribute
    )
    
    button_dict["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    
    button_list.append(button_dict)
    
fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=button_list,
            direction="down",
            pad={"b": 20, "t": 20},
            showactive=True,
            x=0.86,
            xanchor="left",
            y=1.21,
            yanchor="top"
        ),
    ]
)

# Display the plot
fig.show()
```

_Observations:_

* Every appliance except `Entertainment` appliances has constant and very low consumption throughout the period.
* The `Entertainment` appliances have a constant and low consumption but comparatively higher than the other appliances.
* Another interesting observation is that since the `Lamp` is triggered by the _Motion Sensor_, it might be possible that there is some defect in the Lamp since all other observations conclude to the family not being at home.

### ii. Appliance Analysis

The appliance analysis is done on the plugs data. It is done to see what are the trends of the electricity consumption of each appliance over time for each household. This also helps in identifying some living patterns of the families living in the households.

**Living trends analysis:**

To identify the living trends of the families living in the households, the electricity consumption of each appliance is analyzed for each household.

```{python}
# Create a new dataframe
df_plot4 = df2.groupby(["Hour", "Appliance", "Household"]).mean().reset_index()

# Create a figure object
fig = go.Figure()

# Define lists
colors = ["#ff6664", "#ffcc68", "#fee664", "#a86ccd", "#67e068", "#7591bd"]
households = df_plot4["Household"].unique()
hours = df_plot4["Hour"].unique()

# Add a trace for each appliance
for household in households:
    
    for hour in hours:
        
        fig.add_trace(
            go.Bar(
                x=df_plot4[
                    (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour)
                ]["Appliance"],
                y=df_plot4[
                    (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour)
                ]["Reading"],
                name=appliance + " - Household " + household,
                visible=False,
                marker_color=colors,
            )
        )

# Update the layout of the plot
fig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))
fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])
fig.update_layout(
    title=dict(
        text="Average Power Consumption of Appliances for at Hour 1 <br><br>",
        font=dict(size=20),
    ),
    template="ggplot2",
)

fig.data[0].visible = True

# Create slider steps for each household
slider_steps = []
for i, household in enumerate(households):
    
    step = []

    for j, hour in enumerate(hours):
        
        visible = [False] * len(fig.data)
        visible[i * len(hours) + j] = True
        step.append({"label": j+1,
                     "method": "update", 
                     "args": [{"visible": visible},
                              {"title": "Average Power Consumption of Appliances at Hour " + str(hour) + "<br><br>"}
                              ]
                     }
                    )

    slider_steps.append(step)

# Create a slider for each household
sliders = []
for i, household in enumerate(households):
    
    slider = [
        dict(
            active=0,
            currentvalue={"prefix": "Hour: "},
            pad={"t": 30},
            steps=slider_steps[i],
        )
    ]
    sliders.append(slider)
    
# Add the slider for the first household to the figure layout
fig.update_layout(sliders=sliders[0])

# Create a list of visible traces for each household
visible_list = []
for i in range(0, len(fig.data), len(hours)):
    visible = [False] * len(fig.data)
    visible[i] = True
    visible_list.append(visible)

# Create a button for each continent
buttons = []
for i, household in enumerate(households):
    button = dict(
        label="Household " + str(int(household)),
        method="update",
        args=[{"visible": visible_list[i]}, {"sliders": sliders[i]}],
    )
    buttons.append(button)

# Add the buttons to the figure layout
fig.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            active=0,
            buttons=buttons,
            direction="right",
            pad={"r": 10, "t": 15},
            x=0.355,
            xanchor="left",
            y=1.2,
            yanchor="top",
        ),
    ]
)

# Show the figure
fig.show()
```

_Observations:_

* _Household 4:_ There is high usage of `Refrigerator` throughout the day. It can also be seen there are spikes in the usage of `Kitchen appliances` at 6:00 AM, 12:00 PM and 7:00 PM. Adding on to that, `Entertainment` appliances are used just after the Kitchen appliances. There is also no use of appliances from 12:00 AM to 6:00 AM.
* _Household 5:_ As compared to Household 4, the use of `Refrigerator` is low but still higher that all their other appliances. There is less usage of Kitchen appliances as compared to Household 4. In contrast to household 4, this family watches more TV starting from 6:00 PM since there is a lot of usage for `Entertainment` appliances. There is also no use of appliances from 11:00 PM to 7:00 AM.
* _Household 6:_ All the appliances used have a very low average power consumption throughout the day. Hence it is difficult to make any conclusions about the living patterns of the family living in this household. Hence I will not be doing any further analysis on this household.

**Consumption Analysis based on Day, Holiday and Season:**

To analyze the usage of appliances on weekdays, weekends, different seasons or holidays, the data is grouped by the respective categories and the average power consumption of each appliance is calculated. The data is then plotted to see the trends.

```{python}
# Create a new dataframe
df2["Month"] = df2["Date"].dt.month
df2["Month"] = df2["Month"].apply(
    lambda x: datetime.datetime.strptime(str(x), "%m").strftime("%B")
)
df_plot5 = df2.groupby(["Month", "Appliance", "Day"])["Reading"].mean().reset_index()
df_plot6 = (
    df2.groupby(["Month", "Appliance", "Holiday"])["Reading"].mean().reset_index()
)
df_plot7 = df2.groupby(["Month", "Appliance", "Season"])["Reading"].mean().reset_index()

# Create selection attributes
bar = alt.selection(type="single", encodings=["x"])
color = alt.condition(bar, alt.Color("Appliance:N"), alt.value("lightgray"))

# Chart 1
chart1 = (
    alt.Chart(df_plot5)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Day:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Day:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(28),
    )
    .add_selection(bar)
)

# Chart 2
chart2 = (
    alt.Chart(df_plot6)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Holiday:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Holiday:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(28),
    )
    .add_selection(bar)
)

# Chart 3
chart3 = (
    alt.Chart(df_plot7)
    .mark_bar()
    .encode(
        x=alt.X("Appliance:N", title=""),
        y=alt.Y(
            "mean(Reading):Q",
            title="Average Power Consumption",
            axis=alt.Axis(titleFontSize=12),
        ),
        tooltip=["Appliance", "mean(Reading)"],
        color=color,
        column=alt.Column("Season:N", title=""),
    )
    .properties(
        title={
            "fontSize": 15,
            "text": "By Season:",
            "align": "left",
            "anchor": "start",
            "color": "gray",
        },
        width=alt.Step(40),
    )
    .add_selection(bar)
)

# Combine the charts
chart = alt.hconcat(chart1, chart2)
chart = alt.vconcat(chart, chart3, title="Power Consumption of each Appliance")
chart = chart.configure_title(fontSize=20, anchor="middle").configure_view(
    fill="#ededed"
)

# Show the chart
chart
```

_Observations:_

* For `Entertainment` appliances, there is a higher consumption on weekends as compared to weekdays. This is also the case for Holidays and Non-Holidays. During summer, the consumption of electricity by entertainment appliances is far less than the other seasons.
* For `Refrigerator`, there is a higher consumption on weekends as compared to weekdays. The use of refrigerator is more in Summer and Fall as compared to Winter.
* Moving to `Kitchen appliances`, there is a higher consumption on weekdays as compared to weekends. This is not the case for Holidays. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer.
* Another interesting appliance is `Fountain`. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain.
* For `Lamp`, there is a higher consumption during winter as compared to summer and fall.

# V. Results and Inferences

It is observed that the total electricity consumption increases over time for all three households. However, the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6, with household 6 having the lowest consumption.

Further analysis of the plugs level data for households 4 and 6 reveals some anomalies in the data. On 13 December 2012, household 4 had a very high electricity consumption, and plotting the electricity consumption for each appliance for that day revealed that the high consumption was due to high usage of entertainment and kitchen appliances, possibly because the family had guests over for dinner. On the other hand, household 6 had almost constant electricity consumption from 10 November 2012 to 4 January 2013. The analysis of the electricity consumption for each appliance during this period reveals that every appliance except entertainment appliances had constant and very low consumption, which suggests that the family might have been away from home during this period.

The appliance analysis of the plugs data provides some interesting insights into the living patterns of the families living in the households. `Household 4` had high usage of the refrigerator throughout the day, with spikes in the usage of kitchen appliances at specific times, suggesting that the family makes breakfast, lunch, and dinner at specific times throughout the day. The family also watches TV after using the kitchen appliances which indicated that they watch TV after cooking food, and there is no usage of appliances from midnight to 6:00 am, which suggests that the family sleeps during this time. In contrast, `Household 5` had lower usage of the refrigerator but still higher than all other appliances, and a lower usage of Kitchen Appliances indicating that the family is busy and does not cook many meals at home. This family also watches more TV as compared to the family of Household 4, and there is no usage of appliances from 11:00 pm to 7:00 am, suggesting that the family sleeps during this time.

The consumption analysis based on day, holiday, and season reveals that the usage of `Entertainment` appliances is higher on weekends and holidays which is consistent with the observation that people tend to watch more TV during their leisure time. Moving to `Kitchen` appliances, there is a higher consumption on weekdays as compared to weekends. During weekdays, families go for work and usually get up on time, make food and carry tiffins and stay on the same routine throughout. But during weekends, they do not have to get up and work and can enjoy their day offs by going out for food and spending time with family more. This is not the case for Holidays. One would expect that on holidays, people will go out for food and hence the use of kitchen appliances will reduce. But it can also be assumed that, on holidays families have guests over or cook traditional dishes to celebrate the holiday and hence the use of kitchen appliances increases. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer. For `Refrigerator`, there is a higher consumption on weekends as compared to weekdays. This is because people tend to go out more on weekends and hence the refrigerator is not used as much. The use of refrigerator is more in Summer and Fall as compared to Winter. This is fairly obvious, that people do not need to store food inside the refrigerator or eat colder food since the weather is already cold. Another interesting appliance is `Fountain`. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain since it is cold outside and the water will freeze and hence the fountain will not be used. Lastly for `Lamp`, there is a higher consumption during winter as compared to summer and fall. This is because people tend to stay indoors more during winter and hence the lamp is used more. This is also because the duration of daylight is less during winter and hence the lamp is used more to provide light.

Overall, the analysis provides some valuable insights into the electricity consumption patterns of the households and their living patterns, which can be used to develop energy-saving strategies and improve energy efficiency.