The ECO (Electricity Consumption and Occupancy) dataset is a valuable resource for non-intrusive load monitoring and occupancy detection research. The dataset was collected in six Swiss households over a period of eight months, providing a comprehensive view of energy consumption patterns and occupancy information. The ECO dataset is open-source and provided under the Creative Commons License CC BY 4.0, which means that researchers can freely use the data for non-commercial purposes, as long as they give credit to the authors. [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)
The ECO dataset contains three types of data: 1 Hz aggregate consumption data, 1 Hz plug-level data, and occupancy information. The 1 Hz aggregate consumption data contains data on current, voltage, and phase shift for each of the three phases in the household. This data was collected from the smart meter in each household and provides a high-level view of the overall energy consumption patterns in the home.
The 1 Hz plug-level data was collected from selected appliances in each household, providing a more detailed view of energy consumption patterns for specific devices. This data can be used to identify opportunities for energy savings and to develop more accurate load monitoring algorithms.
The occupancy information was collected through a tablet computer and a passive infrared sensor in some of the households. The tablet computer was used for manual labeling of occupancy information, while the passive infrared sensor provided automatic detection of occupancy information. This information can be used to develop occupancy detection algorithms that can be used to optimize energy consumption in the home.
For this project, I will focus on households 4, 5, and 6 and explore the dataset using Python and various visualization libraries.
II. Data Science Questions
What is the overall trend in electricity consumption for households 4, 5, and 6 over the 8-month period?
Which appliances consume the most electricity in each household, and how does this vary over time?
Are there any anomalies or outliers in the data that may require further investigation?
Is there a difference in electricity consumption patterns of appliances for weekdays and weekends?
Is there a difference in electricity consumption patterns of appliances for holidays and non-holidays?
Are the electricity consumption patterns of appliances related to seasons?
III. Data Preparation
The dataset is provided in the form of folders for each household (smart meter and plug-level data). [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)
For the smart meter data, each CSV file contains aggregate consumption data of a single day and is named accordingly (e.g., “2012-08-31.csv”). Each file contains 86,400 rows (i.e., one row for each second). Rows with missing measurements are denoted as “-1”. The coverage specifies the proportion of “existing” values, i.e., values that are > -1. The rows consist of the following comma separated values:
powerallphases: Sum of real power over all phases
powerl1: Real power phase 1
powerl2: Real power phase 2
powerl3: Real power phase 3
currentneutral: Neutral current
currentl1: Current phase 1
currentl2: Current phase 2
currentl3: Current phase 3
voltagel1: Voltage phase 1
voltagel2: Voltage phase 2
voltagel3: Voltage phase 3
phaseanglevoltagel2l1: Phase shift between voltage on phase 2 and 1
phaseanglevoltagel3l1: Phase shift between voltage on phase 3 and 1
phaseanglecurrentvoltagel1: Phase shift between current/voltage on phase 1
phaseanglecurrentvoltagel2: Phase shift between current/voltage on phase 2
phaseanglecurrentvoltagel3: Phase shift between current/voltage on phase 3
Like the smart meter data, the plug data is also provided on a daily based (i.e., 86,400 measurements per day) in CSV format. In contrast to the smart meter data, it contains only one value:
consumption: Real power measured by the plug
Each file contains plug measurements for the whole household. The (daily) data for each plug is stored in subfolders named after the plug ID (e.g., “05”) described above.
A. Import Libraries
Code
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsimport plotly.graph_objects as goimport plotly.express as pximport plotly.io as pioimport altair as altpio.renderers.default ="plotly_mimetype+notebook_connected"import globimport datetimeimport warningswarnings.filterwarnings("ignore")
B. Data Processing
The raw data needs to be processed and cleaned before it can be used for analysis.
i. Extracting Data and Creating Dataframe
The data is extracted from the CSV files and stored in a Pandas dataframe. 4 functions are used to extract the data from the CSV files and create the dataframe:
extract_plugs_data: Extracts the data from the CSV files of plug-level data and stores it in a Pandas dataframe.
extract_sm_data: Extracts the data from the CSV files of smart meter data and stores it in a Pandas dataframe.
create_plugs_df: Combines the data from the CSV files of plug-level data into a single dataframe.
create_sm_df: Combines the data from the CSV files of smart meter data into a single dataframe.
Code
# Define a function to extract the data from a CSV file and format it into a dataframedef extract_plugs_data(csv_file_path, household):""" Reads a CSV file located at the given file path and extracts plug-level data from it. It then adds additional columns to the DataFrame, including the date of the data, the household number, and the appliance type. Returns: _type_: pandas DataFrame """ df = pd.read_csv(csv_file_path, header=None) date_str = csv_file_path.split("/")[-1].replace(".csv", "") date = datetime.datetime.strptime(date_str, "%Y-%m-%d") subfolder = csv_file_path.split("/")[-2] df["Date"] = date df["Household"] = household df["Appliance"] = subfolder df.columns = ["Reading", "Date", "Household", "Appliance"]return dfdef extract_sm_data(csv_file_path, household):""" Reads a CSV file located at the given file path and extracts smart meter data from it. It then adds additional columns to the DataFrame, including the date of the data and the household number. Returns: _type_: pandas DataFrame """ df = pd.read_csv(csv_file_path, header=None) date_str = csv_file_path.split("/")[-1].replace(".csv", "") date = datetime.datetime.strptime(date_str, "%Y-%m-%d") df["Date"] = date df["Household"] = household df.columns = ["powerallphases","powerl1","powerl2","powerl3","currentneutral","currentl1","currentl2","currentl3","voltagel1","voltagel2","voltagel3","phaseanglevoltagel2l1","phaseanglevoltagel3l1","phaseanglecurrentvoltagel1","phaseanglecurrentvoltagel2","phaseanglecurrentvoltagel3","Date","Household", ]# Drop the columns that are not needed df.drop( ["powerl1","powerl2","powerl3","currentneutral","currentl1","currentl2","currentl3","voltagel1","voltagel2","voltagel3","phaseanglevoltagel2l1","phaseanglevoltagel3l1","phaseanglecurrentvoltagel1","phaseanglecurrentvoltagel2","phaseanglecurrentvoltagel3", ], axis=1, inplace=True, )return dfdef create_plugs_df(folder_list, household_list):""" This function takes in a list of folder paths and a corresponding list of household IDs as input. It then loops over each folder path, extracts data from CSV files in the subfolders, and creates a list of dataframes. These dataframes are then combined into a single dataframe using the pandas concat() function and returned as output. Returns: _type_: pandas DataFrame """# Create a list to store the extracted dataframes df_list = []# Loop over each folder path and extract the data from the CSV filesfor i inrange(len(folder_list)): folder_path = folder_list[i] household = household_list[i] subfolder_paths = glob.glob(folder_path +"/*")for subfolder_path in subfolder_paths: csv_files = glob.glob(subfolder_path +"/*.csv")for csv_file in csv_files: df = extract_plugs_data(csv_file, household) df_list.append(df)# Combine the extracted dataframes into a single dataframe plugs_df = pd.concat(df_list, ignore_index=True)return plugs_dfdef create_sm_df(folder_list, household_list):""" This function takes in a list of folder paths and a corresponding list of household IDs as input. It then loops over each folder path, extracts data from CSV files, and creates a list of dataframes. These dataframes are then combined into a single dataframe using the pandas concat() function and returned as output. Returns: _type_: pandas DataFrame """# Create a list to store the extracted dataframes df_list = []# Loop over each folder path and extract the data from the CSV filesfor i inrange(len(folder_list)): folder_path = folder_list[i] household = household_list[i] csv_files = glob.glob(folder_path +"/*.csv")for csv_file in csv_files: df = extract_sm_data(csv_file, household) df_list.append(df)# Combine the extracted dataframes into a single dataframe sm_df = pd.concat(df_list, ignore_index=True)return sm_df# Set the paths to the folders containing the datafolder_paths_sm = ["eco/04_sm", "eco/05_sm", "eco/06_sm"]folder_paths_plugs = ["eco/04_plugs", "eco/05_plugs", "eco/06_plugs"]household_nums = ["04", "05", "06"]# Store the data in a dataframesm_df = create_sm_df(folder_paths_sm, household_nums)plugs_df = create_plugs_df(folder_paths_plugs, household_nums)# Store it in new dataframes to preserve the original datadf1 = sm_df.copy()df2 = plugs_df.copy()
Raw Smart Meter data:
Code
df1.head()
powerallphases
Date
Household
0
344.297
2012-08-16
04
1
343.635
2012-08-16
04
2
344.199
2012-08-16
04
3
344.081
2012-08-16
04
4
344.183
2012-08-16
04
Raw Plugs Data:
Code
df2.head()
Reading
Date
Household
Appliance
0
0.00000
2012-08-16
04
03
1
0.00000
2012-08-16
04
03
2
0.00000
2012-08-16
04
03
3
2.23978
2012-08-16
04
03
4
0.00000
2012-08-16
04
03
ii. Data Cleaning and Feature Engineering
Dealing with missing values:
In the plugs level data, the Readings on 25th July 2012 are missing. All the rows of the 25th July 2012 are removed from the data. Every other missing value is imputed as -1 (as per the dataset description [Kleiminger, Beckel, and Santini (2015)](Beckel et al. 2014)). These values are imputed as 0 to avoid any anomalies while doing analysis.
Code
df1["powerallphases"][df1["powerallphases"] ==-1] =0df2["Reading"][df2["Reading"] ==-1] =0# Drop all rows with Date as 2012-06-25df2 = df2[df2["Date"] !="2012-06-25"]
Down-sizing the data:
Since for each day, there are 86,400 rows, the data should be reduced so that it is not computationallly expensive to do analysis. Here is the shape of the 2 dataframes before downsizing:
Code
print("The shape of Smart Meter data before downsizing is:", sm_df.shape)print("The shape of Plugs level data before downsizing is:", plugs_df.shape)
The shape of Smart Meter data before downsizing is: (51840000, 3)
The shape of Plugs level data before downsizing is: (344908800, 4)
For down-sizing, the 86,400 rows are grouped into 24 groups of 3600 rows each. The mean of each Readings are taken and stored in a new dataframe. This is done for both the smart meter and plug level data.
Code
# Create a new column that contains the hour of the daydf1["Hour"] = (df1.groupby(["Household", "Date"]).cumcount() //3600) +1df1 = df1.groupby(["Household", "Date", "Hour"], as_index=False).mean()df2["Hour"] = (df2.groupby(["Household", "Date", "Appliance"]).cumcount() //3600) +1df2 = df2.groupby(["Household", "Date", "Appliance", "Hour"], as_index=False).mean()
Here is the shape of the 2 dataframes after downsizing:
Code
print("The shape of Smart Meter data after downsizing is:", df1.shape)print("The shape of Plugs level data after downsizing is:", df2.shape)
The shape of Smart Meter data after downsizing is: (14400, 4)
The shape of Plugs level data after downsizing is: (95784, 5)
Feature Engineering:
Many meaningful data can be extracted from the existing columns.
Since the Date column is already is datetime format, the Day of the week can be extracted from it. From this day of the week, we can extract the weekend and weekday information. This information can be used to see if there is any difference in the electricity consumption on weekends and weekdays.
Code
df1["Day"] =""for i inrange(len(df1)): day = df1["Date"][i].weekday()if day <5: df1["Day"][i] ="Weekday"else: df1["Day"][i] ="Weekend"df2["Day"] =""for i inrange(len(df2)): day = df2["Date"][i].weekday()if day <5: df2["Day"][i] ="Weekday"else: df2["Day"][i] ="Weekend"
The Date column can also be used to add Holiday column. This column can be used to see if there is any difference in the electricity consumption on holidays and non-holidays.
# Make a list for the holidays using above markdown cellholiday_list = ["2012-06-07","2012-06-23","2012-06-29","2012-08-01","2012-08-15","2012-09-06","2012-09-16","2012-09-17","2012-09-22","2012-09-25","2012-11-01","2012-12-08","2012-12-25","2012-12-26","2012-12-31","2013-01-01","2013-01-02","2013-01-06",]# Create a new column that contains the holiday informationdf1["Holiday"] =""for i inrange(len(df1)):if df1["Date"][i].strftime("%Y-%m-%d") in holiday_list: df1["Holiday"][i] ="Holiday"else: df1["Holiday"][i] ="Non-Holiday"df2["Holiday"] =""for i inrange(len(df2)):if df2["Date"][i].strftime("%Y-%m-%d") in holiday_list: df2["Holiday"][i] ="Holiday"else: df2["Holiday"][i] ="Non-Holiday"
In Switzerland, the approximate season dates for 2012 and 2013 are as follows:
Spring 2012: March 20 - June 19
Summer 2012: June 20 - September 21
Fall 2012: September 22 - December 1
Winter 2012: December 2, 2012 - March 19, 2013
Code
# Define a custom function to map dates to seasonsdef get_season(date):""" Creates a new column that contains the season information. """ spring_start = pd.Timestamp(year=date.year, month=3, day=20) summer_start = pd.Timestamp(year=date.year, month=6, day=20) fall_start = pd.Timestamp(year=date.year, month=9, day=22) winter_start = pd.Timestamp(year=date.year, month=12, day=2)if date < spring_start or date >= winter_start:return"Winter"elif date < summer_start:return"Spring"elif date < fall_start:return"Summer"else:return"Fall"# Apply the function to the "Date" column and create a new "Season" columndf1["Season"] = df1["Date"].apply(get_season)df2["Season"] = df2["Date"].apply(get_season)
Convert the Appliance types from 01, 02 and so on to their respective appliance names for better understanding the data:
print("------------------------------------\n")print("Smart Meter Data: \n\n", df1.describe())print("\n------------------------------------\n")print("Plugs Data: \n\n", df2.describe())print("\n------------------------------------")
------------------------------------
Smart Meter Data:
Hour powerallphases
count 14400.000000 14400.000000
mean 12.500000 642.279757
std 6.922427 725.089139
min 1.000000 0.000000
25% 6.750000 238.821080
50% 12.500000 452.466749
75% 18.250000 777.330296
max 24.000000 6967.335401
------------------------------------
Plugs Data:
Hour Reading
count 65952.000000 65952.000000
mean 12.500000 21.684298
std 6.922239 34.054088
min 1.000000 0.000000
25% 6.750000 2.120461
50% 12.500000 9.215207
75% 18.250000 20.061170
max 24.000000 793.706827
------------------------------------
B. Visual Analysis
i. Time Series Analysis
The time series analysis is done on the smart meter data. It is done to see what are the trends of the total electricity consumption over time of all 3 households.
Code
# Create a new dataframedf_plot1 = df1.groupby(["Date", "Household"])["powerallphases"].mean().reset_index()# Create a figure objectfig = go.Figure()for household in df_plot1["Household"].unique():# Add a trace for each household fig.add_trace( go.Scatter( x=df_plot1[df_plot1["Household"] == household]["Date"], y=df_plot1[df_plot1["Household"] == household]["powerallphases"], mode="lines", name=household, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)))fig.update_layout( title=dict(text="Total Power Consumption over Time", font=dict(size=20)), legend_title_text="Household", legend=dict(x=0.905, y=0.99), template="ggplot2",)# Display the plotfig.show()
Observations:
The total electricity consumption increases over time (from July to January) for all the 3 households.
If we look at the individual households, we can see that the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6. The total electricity consumption of household 6 is the lowest. This puts forward many questions like why is the total electricity consumption of household 3 the lowest? Is it because the household is not using any appliances? Is it because the household is using appliances that consume less electricity? Is it because there are less people in the household? and many more.
It is also clear that Household 6 has almost constant electricity consumed during the period of 10 November 2012 to 4 January 2013.
For Household 4, on 13 December 2012, the total electricity consumption is very high. These can be explored by doing further analysis.
Further Analysis for Households 4 and 6:
It is evident from the above plot that there are some anomalies in the data for households 4 and 6. To further analyze these anomalies, the Plugs level data for these 2 households is seen separately.
Household 4:
On 13 December 2012, the total electricity consumption is very high. Plotting the electricity consumption for each appliance for that day might give some insight into the reason for the high electricity consumption.
Code
# Create a new dataframedf_plot2 = df2[["Appliance", "Hour", "Reading"]][ (df2["Household"] =="04") & (df2["Date"] =="2012-12-13")]# Create a figure objectfig = go.Figure()# Define a list of colors for each appliancecolors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]# Add a trace for each hourfor hour in df_plot2["Hour"].unique(): fig.add_trace( go.Bar( x=df_plot2[df_plot2["Hour"] == hour]["Appliance"], y=df_plot2[df_plot2["Hour"] == hour]["Reading"], name=str(hour), visible=False, marker_color=colors, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Power Consumption of Each Appliance (for Household 4) for Hour 1 <br><sup>Note: The power consumption is only for 13 December 2012</sup>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =True# Create and add slidersteps = []for i inrange(len(fig.data)): step =dict( method="update", args=[ {"visible": [False] *len(fig.data)}, {"title": "Power Consumption of Each Appliance (for Household 4) for Hour "+str(i +1) +"<br><sup>Note: The power consumption is only for 13 December 2012</sup>" }, ], # layout attribute label=i +1, ) step["args"][0]["visible"][i] =True# Toggle i'th trace to "visible" steps.append(step)sliders = [dict(active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=steps)]# Update the layoutfig.update_layout(sliders=sliders)# Display the plotfig.show()
Observations:
The electricity consumption for Refrigerator is high throughout the day.
Starting from 2:00 PM (Hour 14), it can be seen that there is a high usage of Entertainment appliances which include TV and Stereo.
At 7:00 PM (Hour 19), there is a unusually high spike in the electricity consumption for Kitchen appliances.
Household 6:
From 10 November 2012 to 4 January 2013, the total electricity consumption is almost constant. Plotting the electricity consumption for each appliance for that period might give some insight into the reason for the constant electricity consumption.
Code
# Create a new dataframedf_plot3 = df2[["Date", "Appliance", "Hour", "Reading"]][ (df2["Household"] =="06")& (df2["Date"] >"2012-11-09")& (df2["Date"] <"2013-01-05")]df_plot3 = df_plot3.groupby(["Date", "Appliance"])["Reading"].mean().reset_index()# Create a figure objectfig = go.Figure()# Define a list of colors for each appliancecolors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]i =0# Add a trace for each appliancefor appliance in df_plot3["Appliance"].unique(): fig.add_trace( go.Bar( x=df_plot3[df_plot3["Appliance"] == appliance]["Date"], y=df_plot3[df_plot3["Appliance"] == appliance]["Reading"], name=appliance, visible=False, marker_color=colors[i], ) ) i = i+1# Update the layout of the plotfig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Average Power Consumption of Electronics (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =Truebutton_list = []for i inrange(len(fig.data)): button_dict =dict( label = fig.data[i].name, method="update", args=[{"visible": [False] *len(fig.data)}, {"title": "Average Power Consumption of "+ fig.data[i].name +" (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>"}] # layout attribute ) button_dict["args"][0]["visible"][i] =True# Toggle i'th trace to "visible" button_list.append(button_dict)fig.update_layout( updatemenus=[dict( active=0, buttons=button_list, direction="down", pad={"b": 20, "t": 20}, showactive=True, x=0.86, xanchor="left", y=1.21, yanchor="top" ), ])# Display the plotfig.show()
Observations:
Every appliance except Entertainment appliances has constant and very low consumption throughout the period.
The Entertainment appliances have a constant and low consumption but comparatively higher than the other appliances.
Another interesting observation is that since the Lamp is triggered by the Motion Sensor, it might be possible that there is some defect in the Lamp since all other observations conclude to the family not being at home.
ii. Appliance Analysis
The appliance analysis is done on the plugs data. It is done to see what are the trends of the electricity consumption of each appliance over time for each household. This also helps in identifying some living patterns of the families living in the households.
Living trends analysis:
To identify the living trends of the families living in the households, the electricity consumption of each appliance is analyzed for each household.
Code
# Create a new dataframedf_plot4 = df2.groupby(["Hour", "Appliance", "Household"]).mean().reset_index()# Create a figure objectfig = go.Figure()# Define listscolors = ["#ff6664", "#ffcc68", "#fee664", "#a86ccd", "#67e068", "#7591bd"]households = df_plot4["Household"].unique()hours = df_plot4["Hour"].unique()# Add a trace for each appliancefor household in households:for hour in hours: fig.add_trace( go.Bar( x=df_plot4[ (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour) ]["Appliance"], y=df_plot4[ (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour) ]["Reading"], name=appliance +" - Household "+ household, visible=False, marker_color=colors, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Average Power Consumption of Appliances for at Hour 1 <br><br>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =True# Create slider steps for each householdslider_steps = []for i, household inenumerate(households): step = []for j, hour inenumerate(hours): visible = [False] *len(fig.data) visible[i *len(hours) + j] =True step.append({"label": j+1,"method": "update", "args": [{"visible": visible}, {"title": "Average Power Consumption of Appliances at Hour "+str(hour) +"<br><br>"} ] } ) slider_steps.append(step)# Create a slider for each householdsliders = []for i, household inenumerate(households): slider = [dict( active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=slider_steps[i], ) ] sliders.append(slider)# Add the slider for the first household to the figure layoutfig.update_layout(sliders=sliders[0])# Create a list of visible traces for each householdvisible_list = []for i inrange(0, len(fig.data), len(hours)): visible = [False] *len(fig.data) visible[i] =True visible_list.append(visible)# Create a button for each continentbuttons = []for i, household inenumerate(households): button =dict( label="Household "+str(int(household)), method="update", args=[{"visible": visible_list[i]}, {"sliders": sliders[i]}], ) buttons.append(button)# Add the buttons to the figure layoutfig.update_layout( updatemenus=[dict(type="buttons", active=0, buttons=buttons, direction="right", pad={"r": 10, "t": 15}, x=0.355, xanchor="left", y=1.2, yanchor="top", ), ])# Show the figurefig.show()
Observations:
Household 4: There is high usage of Refrigerator throughout the day. It can also be seen there are spikes in the usage of Kitchen appliances at 6:00 AM, 12:00 PM and 7:00 PM. Adding on to that, Entertainment appliances are used just after the Kitchen appliances. There is also no use of appliances from 12:00 AM to 6:00 AM.
Household 5: As compared to Household 4, the use of Refrigerator is low but still higher that all their other appliances. There is less usage of Kitchen appliances as compared to Household 4. In contrast to household 4, this family watches more TV starting from 6:00 PM since there is a lot of usage for Entertainment appliances. There is also no use of appliances from 11:00 PM to 7:00 AM.
Household 6: All the appliances used have a very low average power consumption throughout the day. Hence it is difficult to make any conclusions about the living patterns of the family living in this household. Hence I will not be doing any further analysis on this household.
Consumption Analysis based on Day, Holiday and Season:
To analyze the usage of appliances on weekdays, weekends, different seasons or holidays, the data is grouped by the respective categories and the average power consumption of each appliance is calculated. The data is then plotted to see the trends.
For Entertainment appliances, there is a higher consumption on weekends as compared to weekdays. This is also the case for Holidays and Non-Holidays. During summer, the consumption of electricity by entertainment appliances is far less than the other seasons.
For Refrigerator, there is a higher consumption on weekends as compared to weekdays. The use of refrigerator is more in Summer and Fall as compared to Winter.
Moving to Kitchen appliances, there is a higher consumption on weekdays as compared to weekends. This is not the case for Holidays. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer.
Another interesting appliance is Fountain. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain.
For Lamp, there is a higher consumption during winter as compared to summer and fall.
V. Results and Inferences
It is observed that the total electricity consumption increases over time for all three households. However, the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6, with household 6 having the lowest consumption.
Further analysis of the plugs level data for households 4 and 6 reveals some anomalies in the data. On 13 December 2012, household 4 had a very high electricity consumption, and plotting the electricity consumption for each appliance for that day revealed that the high consumption was due to high usage of entertainment and kitchen appliances, possibly because the family had guests over for dinner. On the other hand, household 6 had almost constant electricity consumption from 10 November 2012 to 4 January 2013. The analysis of the electricity consumption for each appliance during this period reveals that every appliance except entertainment appliances had constant and very low consumption, which suggests that the family might have been away from home during this period.
The appliance analysis of the plugs data provides some interesting insights into the living patterns of the families living in the households. Household 4 had high usage of the refrigerator throughout the day, with spikes in the usage of kitchen appliances at specific times, suggesting that the family makes breakfast, lunch, and dinner at specific times throughout the day. The family also watches TV after using the kitchen appliances which indicated that they watch TV after cooking food, and there is no usage of appliances from midnight to 6:00 am, which suggests that the family sleeps during this time. In contrast, Household 5 had lower usage of the refrigerator but still higher than all other appliances, and a lower usage of Kitchen Appliances indicating that the family is busy and does not cook many meals at home. This family also watches more TV as compared to the family of Household 4, and there is no usage of appliances from 11:00 pm to 7:00 am, suggesting that the family sleeps during this time.
The consumption analysis based on day, holiday, and season reveals that the usage of Entertainment appliances is higher on weekends and holidays which is consistent with the observation that people tend to watch more TV during their leisure time. Moving to Kitchen appliances, there is a higher consumption on weekdays as compared to weekends. During weekdays, families go for work and usually get up on time, make food and carry tiffins and stay on the same routine throughout. But during weekends, they do not have to get up and work and can enjoy their day offs by going out for food and spending time with family more. This is not the case for Holidays. One would expect that on holidays, people will go out for food and hence the use of kitchen appliances will reduce. But it can also be assumed that, on holidays families have guests over or cook traditional dishes to celebrate the holiday and hence the use of kitchen appliances increases. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer. For Refrigerator, there is a higher consumption on weekends as compared to weekdays. This is because people tend to go out more on weekends and hence the refrigerator is not used as much. The use of refrigerator is more in Summer and Fall as compared to Winter. This is fairly obvious, that people do not need to store food inside the refrigerator or eat colder food since the weather is already cold. Another interesting appliance is Fountain. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain since it is cold outside and the water will freeze and hence the fountain will not be used. Lastly for Lamp, there is a higher consumption during winter as compared to summer and fall. This is because people tend to stay indoors more during winter and hence the lamp is used more. This is also because the duration of daylight is less during winter and hence the lamp is used more to provide light.
Overall, the analysis provides some valuable insights into the electricity consumption patterns of the households and their living patterns, which can be used to develop energy-saving strategies and improve energy efficiency.
References
Beckel, Christian, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini. 2014. “The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms.”Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014).
Kleiminger, Wilhelm, Christian Beckel, and Silvia Santini. 2015. “Household Occupancy Monitoring Using Electricity Meters.”Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015).
“Switzerland Holidays.” n.d.
Source Code
---title: <b>Electricity Consumption Analysis</b>author: Ramdayal Rewariaformat: html: self-contained: true embed-resources: true page-layout: full code-fold: true toc: true toc-depth: 6 toc-expand: 4 toc-location: left theme: lumenexecute: echo: truebibliography: bibliography.bibjupyter: python3---# I. DataThe ECO (Electricity Consumption and Occupancy) dataset is a valuable resource for non-intrusive load monitoring and occupancy detection research. The dataset was collected in six Swiss households over a period of eight months, providing a comprehensive view of energy consumption patterns and occupancy information. The ECO dataset is open-source and provided under the Creative Commons License CC BY 4.0, which means that researchers can freely use the data for non-commercial purposes, as long as they give credit to the authors. [@Paper1][@Paper2]The ECO dataset contains three types of data: 1 Hz aggregate consumption data, 1 Hz plug-level data, and occupancy information. The 1 Hz aggregate consumption data contains data on current, voltage, and phase shift for each of the three phases in the household. This data was collected from the smart meter in each household and provides a high-level view of the overall energy consumption patterns in the home.The 1 Hz plug-level data was collected from selected appliances in each household, providing a more detailed view of energy consumption patterns for specific devices. This data can be used to identify opportunities for energy savings and to develop more accurate load monitoring algorithms.The occupancy information was collected through a tablet computer and a passive infrared sensor in some of the households. The tablet computer was used for manual labeling of occupancy information, while the passive infrared sensor provided automatic detection of occupancy information. This information can be used to develop occupancy detection algorithms that can be used to optimize energy consumption in the home.For this project, I will focus on households 4, 5, and 6 and explore the dataset using Python and various visualization libraries.# II. Data Science Questions1. What is the overall trend in electricity consumption for households 4, 5, and 6 over the 8-month period?2. Which appliances consume the most electricity in each household, and how does this vary over time?3. Are there any anomalies or outliers in the data that may require further investigation?4. Is there a difference in electricity consumption patterns of appliances for weekdays and weekends? 5. Is there a difference in electricity consumption patterns of appliances for holidays and non-holidays?6. Are the electricity consumption patterns of appliances related to seasons?# III. Data PreparationThe dataset is provided in the form of folders for each household (smart meter and plug-level data). [@Paper1][@Paper2]For the smart meter data, each CSV file contains aggregate consumption data of a single day and is named accordingly (e.g., "2012-08-31.csv"). Each file contains 86,400 rows (i.e., one row for each second). Rows with missing measurements are denoted as "-1". The coverage specifies the proportion of "existing" values, i.e., values that are > -1. The rows consist of the following comma separated values:* powerallphases: Sum of real power over all phases* powerl1: Real power phase 1* powerl2: Real power phase 2* powerl3: Real power phase 3* currentneutral: Neutral current* currentl1: Current phase 1* currentl2: Current phase 2* currentl3: Current phase 3* voltagel1: Voltage phase 1* voltagel2: Voltage phase 2* voltagel3: Voltage phase 3* phaseanglevoltagel2l1: Phase shift between voltage on phase 2 and 1* phaseanglevoltagel3l1: Phase shift between voltage on phase 3 and 1* phaseanglecurrentvoltagel1: Phase shift between current/voltage on phase 1* phaseanglecurrentvoltagel2: Phase shift between current/voltage on phase 2 * phaseanglecurrentvoltagel3: Phase shift between current/voltage on phase 3Like the smart meter data, the plug data is also provided on a daily based (i.e., 86,400 measurements per day) in CSV format. In contrast to the smart meter data, it contains only one value:* consumption: Real power measured by the plugEach file contains plug measurements for the whole household. The (daily) data for each plug is stored in subfolders named after the plug ID (e.g., "05") described above.## A. Import Libraries```{python}import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsimport plotly.graph_objects as goimport plotly.express as pximport plotly.io as pioimport altair as altpio.renderers.default ="plotly_mimetype+notebook_connected"import globimport datetimeimport warningswarnings.filterwarnings("ignore")```## B. Data ProcessingThe raw data needs to be processed and cleaned before it can be used for analysis.### i. Extracting Data and Creating DataframeThe data is extracted from the CSV files and stored in a Pandas dataframe. 4 functions are used to extract the data from the CSV files and create the dataframe:* **extract_plugs_data**: Extracts the data from the CSV files of plug-level data and stores it in a Pandas dataframe.* **extract_sm_data**: Extracts the data from the CSV files of smart meter data and stores it in a Pandas dataframe.* **create_plugs_df**: Combines the data from the CSV files of plug-level data into a single dataframe.* **create_sm_df**: Combines the data from the CSV files of smart meter data into a single dataframe.```{python}# Define a function to extract the data from a CSV file and format it into a dataframedef extract_plugs_data(csv_file_path, household):""" Reads a CSV file located at the given file path and extracts plug-level data from it. It then adds additional columns to the DataFrame, including the date of the data, the household number, and the appliance type. Returns: _type_: pandas DataFrame """ df = pd.read_csv(csv_file_path, header=None) date_str = csv_file_path.split("/")[-1].replace(".csv", "") date = datetime.datetime.strptime(date_str, "%Y-%m-%d") subfolder = csv_file_path.split("/")[-2] df["Date"] = date df["Household"] = household df["Appliance"] = subfolder df.columns = ["Reading", "Date", "Household", "Appliance"]return dfdef extract_sm_data(csv_file_path, household):""" Reads a CSV file located at the given file path and extracts smart meter data from it. It then adds additional columns to the DataFrame, including the date of the data and the household number. Returns: _type_: pandas DataFrame """ df = pd.read_csv(csv_file_path, header=None) date_str = csv_file_path.split("/")[-1].replace(".csv", "") date = datetime.datetime.strptime(date_str, "%Y-%m-%d") df["Date"] = date df["Household"] = household df.columns = ["powerallphases","powerl1","powerl2","powerl3","currentneutral","currentl1","currentl2","currentl3","voltagel1","voltagel2","voltagel3","phaseanglevoltagel2l1","phaseanglevoltagel3l1","phaseanglecurrentvoltagel1","phaseanglecurrentvoltagel2","phaseanglecurrentvoltagel3","Date","Household", ]# Drop the columns that are not needed df.drop( ["powerl1","powerl2","powerl3","currentneutral","currentl1","currentl2","currentl3","voltagel1","voltagel2","voltagel3","phaseanglevoltagel2l1","phaseanglevoltagel3l1","phaseanglecurrentvoltagel1","phaseanglecurrentvoltagel2","phaseanglecurrentvoltagel3", ], axis=1, inplace=True, )return dfdef create_plugs_df(folder_list, household_list):""" This function takes in a list of folder paths and a corresponding list of household IDs as input. It then loops over each folder path, extracts data from CSV files in the subfolders, and creates a list of dataframes. These dataframes are then combined into a single dataframe using the pandas concat() function and returned as output. Returns: _type_: pandas DataFrame """# Create a list to store the extracted dataframes df_list = []# Loop over each folder path and extract the data from the CSV filesfor i inrange(len(folder_list)): folder_path = folder_list[i] household = household_list[i] subfolder_paths = glob.glob(folder_path +"/*")for subfolder_path in subfolder_paths: csv_files = glob.glob(subfolder_path +"/*.csv")for csv_file in csv_files: df = extract_plugs_data(csv_file, household) df_list.append(df)# Combine the extracted dataframes into a single dataframe plugs_df = pd.concat(df_list, ignore_index=True)return plugs_dfdef create_sm_df(folder_list, household_list):""" This function takes in a list of folder paths and a corresponding list of household IDs as input. It then loops over each folder path, extracts data from CSV files, and creates a list of dataframes. These dataframes are then combined into a single dataframe using the pandas concat() function and returned as output. Returns: _type_: pandas DataFrame """# Create a list to store the extracted dataframes df_list = []# Loop over each folder path and extract the data from the CSV filesfor i inrange(len(folder_list)): folder_path = folder_list[i] household = household_list[i] csv_files = glob.glob(folder_path +"/*.csv")for csv_file in csv_files: df = extract_sm_data(csv_file, household) df_list.append(df)# Combine the extracted dataframes into a single dataframe sm_df = pd.concat(df_list, ignore_index=True)return sm_df# Set the paths to the folders containing the datafolder_paths_sm = ["eco/04_sm", "eco/05_sm", "eco/06_sm"]folder_paths_plugs = ["eco/04_plugs", "eco/05_plugs", "eco/06_plugs"]household_nums = ["04", "05", "06"]# Store the data in a dataframesm_df = create_sm_df(folder_paths_sm, household_nums)plugs_df = create_plugs_df(folder_paths_plugs, household_nums)# Store it in new dataframes to preserve the original datadf1 = sm_df.copy()df2 = plugs_df.copy()```Raw Smart Meter data:```{python}df1.head()```Raw Plugs Data:```{python}df2.head()```### ii. Data Cleaning and Feature Engineering_Dealing with missing values:_In the plugs level data, the Readings on 25th July 2012 are missing. All the rows of the 25th July 2012 are removed from the data. Every other missing value is imputed as -1 (as per the dataset description [@Paper1][@Paper2]). These values are imputed as 0 to avoid any anomalies while doing analysis.```{python}df1["powerallphases"][df1["powerallphases"] ==-1] =0df2["Reading"][df2["Reading"] ==-1] =0# Drop all rows with Date as 2012-06-25df2 = df2[df2["Date"] !="2012-06-25"]```_Down-sizing the data:_Since for each day, there are 86,400 rows, the data should be reduced so that it is not computationallly expensive to do analysis. Here is the shape of the 2 dataframes before downsizing:```{python}print("The shape of Smart Meter data before downsizing is:", sm_df.shape)print("The shape of Plugs level data before downsizing is:", plugs_df.shape)```For down-sizing, the 86,400 rows are grouped into 24 groups of 3600 rows each. The mean of each Readings are taken and stored in a new dataframe. This is done for both the smart meter and plug level data.```{python}# Create a new column that contains the hour of the daydf1["Hour"] = (df1.groupby(["Household", "Date"]).cumcount() //3600) +1df1 = df1.groupby(["Household", "Date", "Hour"], as_index=False).mean()df2["Hour"] = (df2.groupby(["Household", "Date", "Appliance"]).cumcount() //3600) +1df2 = df2.groupby(["Household", "Date", "Appliance", "Hour"], as_index=False).mean()```Here is the shape of the 2 dataframes after downsizing:```{python}print("The shape of Smart Meter data after downsizing is:", df1.shape)print("The shape of Plugs level data after downsizing is:", df2.shape)```_Feature Engineering:_Many meaningful data can be extracted from the existing columns.Since the Date column is already is datetime format, the Day of the week can be extracted from it. From this day of the week, we can extract the weekend and weekday information. This information can be used to see if there is any difference in the electricity consumption on weekends and weekdays.```{python}df1["Day"] =""for i inrange(len(df1)): day = df1["Date"][i].weekday()if day <5: df1["Day"][i] ="Weekday"else: df1["Day"][i] ="Weekend"df2["Day"] =""for i inrange(len(df2)): day = df2["Date"][i].weekday()if day <5: df2["Day"][i] ="Weekday"else: df2["Day"][i] ="Weekend"```The Date column can also be used to add Holiday column. This column can be used to see if there is any difference in the electricity consumption on holidays and non-holidays._List of Holidays in Switzerland for the period of June 2012 to January 2013 [@holiday]:_| Date | Day | Holiday Name | Type of Holiday ||-------------|-----------|------------------------------|----------------------|| Jun 7 | Thursday | Corpus Christi | Common local holiday || Jun 23 | Saturday | Jura Independence Day | Common local holiday || Jun 29 | Friday | St. Peter and St. Paul | Common local holiday || Aug 1 | Wednesday| Swiss National Day | National holiday || Aug 15 | Wednesday| Assumption of Mary | Common local holiday || Sep 6 | Thursday | Genevese Fast | Common local holiday || Sep 16 | Sunday | Swiss Federal Fast | Common local holiday || Sep 17 | Monday | Federal Fast Monday | Common local holiday || Sep 22 | Saturday | Mauritius Day | Common local holiday || Sep 25 | Tuesday | St. Nicholas of Flüe | Common local holiday || Nov 1 | Thursday | All Saints' Day | Common local holiday || Dec 8 | Saturday | Immaculate Conception | Common local holiday || Dec 25 | Tuesday | Christmas Day | Common local holiday || Dec 26 | Wednesday| St. Stephen's Day | Common local holiday || Dec 31 | Monday | Restoration Day | Common local holiday || Jan 1 | Tuesday | New Year's Day | Common local holiday || Jan 2 | Wednesday| Berchtold Day | Common local holiday || Jan 6 | Sunday | Epiphany | Common local holiday |```{python}# Make a list for the holidays using above markdown cellholiday_list = ["2012-06-07","2012-06-23","2012-06-29","2012-08-01","2012-08-15","2012-09-06","2012-09-16","2012-09-17","2012-09-22","2012-09-25","2012-11-01","2012-12-08","2012-12-25","2012-12-26","2012-12-31","2013-01-01","2013-01-02","2013-01-06",]# Create a new column that contains the holiday informationdf1["Holiday"] =""for i inrange(len(df1)):if df1["Date"][i].strftime("%Y-%m-%d") in holiday_list: df1["Holiday"][i] ="Holiday"else: df1["Holiday"][i] ="Non-Holiday"df2["Holiday"] =""for i inrange(len(df2)):if df2["Date"][i].strftime("%Y-%m-%d") in holiday_list: df2["Holiday"][i] ="Holiday"else: df2["Holiday"][i] ="Non-Holiday"```In Switzerland, the approximate season dates for 2012 and 2013 are as follows:* Spring 2012: March 20 - June 19* Summer 2012: June 20 - September 21* Fall 2012: September 22 - December 1* Winter 2012: December 2, 2012 - March 19, 2013```{python}# Define a custom function to map dates to seasonsdef get_season(date):""" Creates a new column that contains the season information. """ spring_start = pd.Timestamp(year=date.year, month=3, day=20) summer_start = pd.Timestamp(year=date.year, month=6, day=20) fall_start = pd.Timestamp(year=date.year, month=9, day=22) winter_start = pd.Timestamp(year=date.year, month=12, day=2)if date < spring_start or date >= winter_start:return"Winter"elif date < summer_start:return"Spring"elif date < fall_start:return"Summer"else:return"Fall"# Apply the function to the "Date" column and create a new "Season" columndf1["Season"] = df1["Date"].apply(get_season)df2["Season"] = df2["Date"].apply(get_season)```_Convert the Appliance types from 01, 02 and so on to their respective appliance names for better understanding the data:_```{python}for i inrange(len(df2)):if df2["Household"][i] =="04": appliance_types = {"Fridge": "01","Kitchen appliances": "02","Lamp": "03","Stereo and laptop": "04","Freezer": "05","Tablet": "06","Entertainment": "07","Microwave": "08", } df2["Appliance"][i] =list(appliance_types.keys())[list(appliance_types.values()).index(df2["Appliance"][i]) ]elif df2["Household"][i] =="05": appliance_types = {"Tablet": "01","Coffee machine": "02","Fountain": "03","Microwave": "04","Fridge": "05","Entertainment": "06","PC": "07","Kettle": "08", } df2["Appliance"][i] =list(appliance_types.keys())[list(appliance_types.values()).index(df2["Appliance"][i]) ]elif df2["Household"][i] =="06": appliance_types = {"Lamp": "01","Laptop": "02","Router": "03","Coffee machine": "04","Entertainment": "05","Fridge": "06","Kettle": "07", } df2["Appliance"][i] =list(appliance_types.keys())[list(appliance_types.values()).index(df2["Appliance"][i]) ]else:pass```_I noticed that some appliances are redundant and can be grouped together for better analysis._ The unique appliance types are as follows:```{python}list(df2["Appliance"].unique())```Appliances like:* `Microwave`, `Coffee machine` and `Kettle` can be combined into `Kitchen appliances`* `Stereo and laptop`, `Tablet`, `PC`, `Router` and `Laptop` can be combined into `Electronics`* `Fridge` and `Freezer` can be combined into `Refrigerator````{python}for i inrange(len(df2)):if (df2["Appliance"][i] =="Fridge") | (df2["Appliance"][i] =="Freezer"): df2["Appliance"][i] ="Refrigerator"elif ( (df2["Appliance"][i] =="Stereo and laptop")| (df2["Appliance"][i] =="Laptop")| (df2["Appliance"][i] =="PC")| (df2["Appliance"][i] =="Tablet")| (df2["Appliance"][i] =="Router") ): df2["Appliance"][i] ="Electronics"elif ( (df2["Appliance"][i] =="Microwave")| (df2["Appliance"][i] =="Coffee machine")| (df2["Appliance"][i] =="Kettle")| (df2["Appliance"][i] =="Kitchen appliances") ): df2["Appliance"][i] ="Kitchen"else:pass```The updated list of appliances are as follows:```{python}df2 = ( df2.groupby(["Household", "Date", "Appliance", "Hour", "Day", "Holiday", "Season"]) .mean() .reset_index())list(df2["Appliance"].unique())```**_The final dataframes look like this:_**Smart Meter Data:```{python}df1.to_csv("Smart_Meter_Data.csv", index=False)df1.head()```Plugs Data:```{python}df2.to_csv("Plugs_Data.csv", index=False)df2.head()```# IV. Exploratory Data Analysis## A. Statistical Analysis**Description of the Data:**```{python}print("------------------------------------\n")print("Smart Meter Data: \n\n", df1.describe())print("\n------------------------------------\n")print("Plugs Data: \n\n", df2.describe())print("\n------------------------------------")```## B. Visual Analysis### i. Time Series AnalysisThe time series analysis is done on the smart meter data. It is done to see what are the trends of the total electricity consumption over time of all 3 households.```{python}# Create a new dataframedf_plot1 = df1.groupby(["Date", "Household"])["powerallphases"].mean().reset_index()# Create a figure objectfig = go.Figure()for household in df_plot1["Household"].unique():# Add a trace for each household fig.add_trace( go.Scatter( x=df_plot1[df_plot1["Household"] == household]["Date"], y=df_plot1[df_plot1["Household"] == household]["powerallphases"], mode="lines", name=household, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)))fig.update_layout( title=dict(text="Total Power Consumption over Time", font=dict(size=20)), legend_title_text="Household", legend=dict(x=0.905, y=0.99), template="ggplot2",)# Display the plotfig.show()```_Observations:_* The total electricity consumption increases over time (from July to January) for all the 3 households. * If we look at the individual households, we can see that the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6. The total electricity consumption of household 6 is the lowest. This puts forward many questions like why is the total electricity consumption of household 3 the lowest? Is it because the household is not using any appliances? Is it because the household is using appliances that consume less electricity? Is it because there are less people in the household? and many more.* It is also clear that Household 6 has almost constant electricity consumed during the period of 10 November 2012 to 4 January 2013.* For Household 4, on 13 December 2012, the total electricity consumption is very high. These can be explored by doing further analysis.**Further Analysis for Households 4 and 6:**It is evident from the above plot that there are some anomalies in the data for households 4 and 6. To further analyze these anomalies, the Plugs level data for these 2 households is seen separately._Household 4:_ On 13 December 2012, the total electricity consumption is very high. Plotting the electricity consumption for each appliance for that day might give some insight into the reason for the high electricity consumption.```{python}# Create a new dataframedf_plot2 = df2[["Appliance", "Hour", "Reading"]][ (df2["Household"] =="04") & (df2["Date"] =="2012-12-13")]# Create a figure objectfig = go.Figure()# Define a list of colors for each appliancecolors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]# Add a trace for each hourfor hour in df_plot2["Hour"].unique(): fig.add_trace( go.Bar( x=df_plot2[df_plot2["Hour"] == hour]["Appliance"], y=df_plot2[df_plot2["Hour"] == hour]["Reading"], name=str(hour), visible=False, marker_color=colors, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Power Consumption of Each Appliance (for Household 4) for Hour 1 <br><sup>Note: The power consumption is only for 13 December 2012</sup>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =True# Create and add slidersteps = []for i inrange(len(fig.data)): step =dict( method="update", args=[ {"visible": [False] *len(fig.data)}, {"title": "Power Consumption of Each Appliance (for Household 4) for Hour "+str(i +1) +"<br><sup>Note: The power consumption is only for 13 December 2012</sup>" }, ], # layout attribute label=i +1, ) step["args"][0]["visible"][i] =True# Toggle i'th trace to "visible" steps.append(step)sliders = [dict(active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=steps)]# Update the layoutfig.update_layout(sliders=sliders)# Display the plotfig.show()```_Observations:_* The electricity consumption for `Refrigerator` is high throughout the day. * Starting from 2:00 PM (Hour 14), it can be seen that there is a high usage of `Entertainment` appliances which include _TV and Stereo_.* At 7:00 PM (Hour 19), there is a unusually high spike in the electricity consumption for `Kitchen appliances`._Household 6:_From 10 November 2012 to 4 January 2013, the total electricity consumption is almost constant. Plotting the electricity consumption for each appliance for that period might give some insight into the reason for the constant electricity consumption.```{python}# Create a new dataframedf_plot3 = df2[["Date", "Appliance", "Hour", "Reading"]][ (df2["Household"] =="06")& (df2["Date"] >"2012-11-09")& (df2["Date"] <"2013-01-05")]df_plot3 = df_plot3.groupby(["Date", "Appliance"])["Reading"].mean().reset_index()# Create a figure objectfig = go.Figure()# Define a list of colors for each appliancecolors = ["#ff6664", "#ffcc68", "#a86ccd", "#67e068", "#7591bd"]i =0# Add a trace for each appliancefor appliance in df_plot3["Appliance"].unique(): fig.add_trace( go.Bar( x=df_plot3[df_plot3["Appliance"] == appliance]["Date"], y=df_plot3[df_plot3["Appliance"] == appliance]["Reading"], name=appliance, visible=False, marker_color=colors[i], ) ) i = i+1# Update the layout of the plotfig.update_xaxes(title=dict(text="Date", font=dict(size=15)), tickformat="%Y/%m/%d")fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Average Power Consumption of Electronics (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =Truebutton_list = []for i inrange(len(fig.data)): button_dict =dict( label = fig.data[i].name, method="update", args=[{"visible": [False] *len(fig.data)}, {"title": "Average Power Consumption of "+ fig.data[i].name +" (for Household 6) <br><sup>Note: The power consumption is only from 10 November 2012 to 4 January 2013</sup>"}] # layout attribute ) button_dict["args"][0]["visible"][i] =True# Toggle i'th trace to "visible" button_list.append(button_dict)fig.update_layout( updatemenus=[dict( active=0, buttons=button_list, direction="down", pad={"b": 20, "t": 20}, showactive=True, x=0.86, xanchor="left", y=1.21, yanchor="top" ), ])# Display the plotfig.show()```_Observations:_* Every appliance except `Entertainment` appliances has constant and very low consumption throughout the period.* The `Entertainment` appliances have a constant and low consumption but comparatively higher than the other appliances.* Another interesting observation is that since the `Lamp` is triggered by the _Motion Sensor_, it might be possible that there is some defect in the Lamp since all other observations conclude to the family not being at home.### ii. Appliance AnalysisThe appliance analysis is done on the plugs data. It is done to see what are the trends of the electricity consumption of each appliance over time for each household. This also helps in identifying some living patterns of the families living in the households.**Living trends analysis:**To identify the living trends of the families living in the households, the electricity consumption of each appliance is analyzed for each household.```{python}# Create a new dataframedf_plot4 = df2.groupby(["Hour", "Appliance", "Household"]).mean().reset_index()# Create a figure objectfig = go.Figure()# Define listscolors = ["#ff6664", "#ffcc68", "#fee664", "#a86ccd", "#67e068", "#7591bd"]households = df_plot4["Household"].unique()hours = df_plot4["Hour"].unique()# Add a trace for each appliancefor household in households:for hour in hours: fig.add_trace( go.Bar( x=df_plot4[ (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour) ]["Appliance"], y=df_plot4[ (df_plot4["Household"] == household) & (df_plot4["Hour"] == hour) ]["Reading"], name=appliance +" - Household "+ household, visible=False, marker_color=colors, ) )# Update the layout of the plotfig.update_xaxes(title=dict(text="Appliance", font=dict(size=15)))fig.update_yaxes(title=dict(text="Power (Watts)", font=dict(size=15)), range=[0, 150])fig.update_layout( title=dict( text="Average Power Consumption of Appliances for at Hour 1 <br><br>", font=dict(size=20), ), template="ggplot2",)fig.data[0].visible =True# Create slider steps for each householdslider_steps = []for i, household inenumerate(households): step = []for j, hour inenumerate(hours): visible = [False] *len(fig.data) visible[i *len(hours) + j] =True step.append({"label": j+1,"method": "update", "args": [{"visible": visible}, {"title": "Average Power Consumption of Appliances at Hour "+str(hour) +"<br><br>"} ] } ) slider_steps.append(step)# Create a slider for each householdsliders = []for i, household inenumerate(households): slider = [dict( active=0, currentvalue={"prefix": "Hour: "}, pad={"t": 30}, steps=slider_steps[i], ) ] sliders.append(slider)# Add the slider for the first household to the figure layoutfig.update_layout(sliders=sliders[0])# Create a list of visible traces for each householdvisible_list = []for i inrange(0, len(fig.data), len(hours)): visible = [False] *len(fig.data) visible[i] =True visible_list.append(visible)# Create a button for each continentbuttons = []for i, household inenumerate(households): button =dict( label="Household "+str(int(household)), method="update", args=[{"visible": visible_list[i]}, {"sliders": sliders[i]}], ) buttons.append(button)# Add the buttons to the figure layoutfig.update_layout( updatemenus=[dict(type="buttons", active=0, buttons=buttons, direction="right", pad={"r": 10, "t": 15}, x=0.355, xanchor="left", y=1.2, yanchor="top", ), ])# Show the figurefig.show()```_Observations:_* _Household 4:_ There is high usage of `Refrigerator` throughout the day. It can also be seen there are spikes in the usage of `Kitchen appliances` at 6:00 AM, 12:00 PM and 7:00 PM. Adding on to that, `Entertainment` appliances are used just after the Kitchen appliances. There is also no use of appliances from 12:00 AM to 6:00 AM.* _Household 5:_ As compared to Household 4, the use of `Refrigerator` is low but still higher that all their other appliances. There is less usage of Kitchen appliances as compared to Household 4. In contrast to household 4, this family watches more TV starting from 6:00 PM since there is a lot of usage for `Entertainment` appliances. There is also no use of appliances from 11:00 PM to 7:00 AM.* _Household 6:_ All the appliances used have a very low average power consumption throughout the day. Hence it is difficult to make any conclusions about the living patterns of the family living in this household. Hence I will not be doing any further analysis on this household.**Consumption Analysis based on Day, Holiday and Season:**To analyze the usage of appliances on weekdays, weekends, different seasons or holidays, the data is grouped by the respective categories and the average power consumption of each appliance is calculated. The data is then plotted to see the trends.```{python}# Create a new dataframedf2["Month"] = df2["Date"].dt.monthdf2["Month"] = df2["Month"].apply(lambda x: datetime.datetime.strptime(str(x), "%m").strftime("%B"))df_plot5 = df2.groupby(["Month", "Appliance", "Day"])["Reading"].mean().reset_index()df_plot6 = ( df2.groupby(["Month", "Appliance", "Holiday"])["Reading"].mean().reset_index())df_plot7 = df2.groupby(["Month", "Appliance", "Season"])["Reading"].mean().reset_index()# Create selection attributesbar = alt.selection(type="single", encodings=["x"])color = alt.condition(bar, alt.Color("Appliance:N"), alt.value("lightgray"))# Chart 1chart1 = ( alt.Chart(df_plot5) .mark_bar() .encode( x=alt.X("Appliance:N", title=""), y=alt.Y("mean(Reading):Q", title="Average Power Consumption", axis=alt.Axis(titleFontSize=12), ), tooltip=["Appliance", "mean(Reading)"], color=color, column=alt.Column("Day:N", title=""), ) .properties( title={"fontSize": 15,"text": "By Day:","align": "left","anchor": "start","color": "gray", }, width=alt.Step(28), ) .add_selection(bar))# Chart 2chart2 = ( alt.Chart(df_plot6) .mark_bar() .encode( x=alt.X("Appliance:N", title=""), y=alt.Y("mean(Reading):Q", title="Average Power Consumption", axis=alt.Axis(titleFontSize=12), ), tooltip=["Appliance", "mean(Reading)"], color=color, column=alt.Column("Holiday:N", title=""), ) .properties( title={"fontSize": 15,"text": "By Holiday:","align": "left","anchor": "start","color": "gray", }, width=alt.Step(28), ) .add_selection(bar))# Chart 3chart3 = ( alt.Chart(df_plot7) .mark_bar() .encode( x=alt.X("Appliance:N", title=""), y=alt.Y("mean(Reading):Q", title="Average Power Consumption", axis=alt.Axis(titleFontSize=12), ), tooltip=["Appliance", "mean(Reading)"], color=color, column=alt.Column("Season:N", title=""), ) .properties( title={"fontSize": 15,"text": "By Season:","align": "left","anchor": "start","color": "gray", }, width=alt.Step(40), ) .add_selection(bar))# Combine the chartschart = alt.hconcat(chart1, chart2)chart = alt.vconcat(chart, chart3, title="Power Consumption of each Appliance")chart = chart.configure_title(fontSize=20, anchor="middle").configure_view( fill="#ededed")# Show the chartchart```_Observations:_* For `Entertainment` appliances, there is a higher consumption on weekends as compared to weekdays. This is also the case for Holidays and Non-Holidays. During summer, the consumption of electricity by entertainment appliances is far less than the other seasons.* For `Refrigerator`, there is a higher consumption on weekends as compared to weekdays. The use of refrigerator is more in Summer and Fall as compared to Winter.* Moving to `Kitchen appliances`, there is a higher consumption on weekdays as compared to weekends. This is not the case for Holidays. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer.* Another interesting appliance is `Fountain`. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain.* For `Lamp`, there is a higher consumption during winter as compared to summer and fall.# V. Results and InferencesIt is observed that the total electricity consumption increases over time for all three households. However, the total electricity consumption of household 4 is the highest, followed by household 5 and then household 6, with household 6 having the lowest consumption.Further analysis of the plugs level data for households 4 and 6 reveals some anomalies in the data. On 13 December 2012, household 4 had a very high electricity consumption, and plotting the electricity consumption for each appliance for that day revealed that the high consumption was due to high usage of entertainment and kitchen appliances, possibly because the family had guests over for dinner. On the other hand, household 6 had almost constant electricity consumption from 10 November 2012 to 4 January 2013. The analysis of the electricity consumption for each appliance during this period reveals that every appliance except entertainment appliances had constant and very low consumption, which suggests that the family might have been away from home during this period.The appliance analysis of the plugs data provides some interesting insights into the living patterns of the families living in the households. `Household 4` had high usage of the refrigerator throughout the day, with spikes in the usage of kitchen appliances at specific times, suggesting that the family makes breakfast, lunch, and dinner at specific times throughout the day. The family also watches TV after using the kitchen appliances which indicated that they watch TV after cooking food, and there is no usage of appliances from midnight to 6:00 am, which suggests that the family sleeps during this time. In contrast, `Household 5` had lower usage of the refrigerator but still higher than all other appliances, and a lower usage of Kitchen Appliances indicating that the family is busy and does not cook many meals at home. This family also watches more TV as compared to the family of Household 4, and there is no usage of appliances from 11:00 pm to 7:00 am, suggesting that the family sleeps during this time.The consumption analysis based on day, holiday, and season reveals that the usage of `Entertainment` appliances is higher on weekends and holidays which is consistent with the observation that people tend to watch more TV during their leisure time. Moving to `Kitchen` appliances, there is a higher consumption on weekdays as compared to weekends. During weekdays, families go for work and usually get up on time, make food and carry tiffins and stay on the same routine throughout. But during weekends, they do not have to get up and work and can enjoy their day offs by going out for food and spending time with family more. This is not the case for Holidays. One would expect that on holidays, people will go out for food and hence the use of kitchen appliances will reduce. But it can also be assumed that, on holidays families have guests over or cook traditional dishes to celebrate the holiday and hence the use of kitchen appliances increases. Different seasons do not affect the use of kitchen appliances by a lot but there is a slight increase in the use of kitchen appliances during Fall and Winter as compared to Summer. For `Refrigerator`, there is a higher consumption on weekends as compared to weekdays. This is because people tend to go out more on weekends and hence the refrigerator is not used as much. The use of refrigerator is more in Summer and Fall as compared to Winter. This is fairly obvious, that people do not need to store food inside the refrigerator or eat colder food since the weather is already cold. Another interesting appliance is `Fountain`. It is observed that there is little to no variation in the usage of fountain for weekends, weekdays, holidays and seasons. But during fall and winter, there is no consumption by fountain since it is cold outside and the water will freeze and hence the fountain will not be used. Lastly for `Lamp`, there is a higher consumption during winter as compared to summer and fall. This is because people tend to stay indoors more during winter and hence the lamp is used more. This is also because the duration of daylight is less during winter and hence the lamp is used more to provide light.Overall, the analysis provides some valuable insights into the electricity consumption patterns of the households and their living patterns, which can be used to develop energy-saving strategies and improve energy efficiency.