Auckland University of Technology:

Foundation Data Science - 26 April 2023

Code:

Jupyter Notebook - on GitHub

Data Information:

Electric Power Consumption Dataset - Irvine Machine Learning Repository

Case Study: Household Power Consumption Analysis

This project was part of the Foundation of Data Science course in the Graduate Diploma in Computer and Information Science program. It focuses on analyzing and predicting household power consumption using Python and a dataset from a household near Paris, France. The study primarily examined Global Active Power and its correlation with other factors, aiming to create reliable models for energy consumption forecasting.


Objective:

The primary goal was to explore, visualize, and model power consumption patterns to provide insights into energy usage and optimization.


Key Features:

  • Dataset: Nearly 4 years of household power consumption data with variables such as Global Active Power, Voltage, and sub-metering data for various equipment.
  • Data Exploration & Visualization:
    • Cleaning and summarizing the dataset to identify trends and patterns.
    • Insights:
      • Power consumption peaks during winter and weekends.
      • Uneven submetering distributions indicated potential biases in equipment usage data.
  • Correlation Analysis: Heatmaps and regression models highlighted significant relationships between Global Active Power and other variables such as Global Intensity.
  • Model Development:
    • Iterative linear regression models were developed to improve prediction accuracy.
    • Model 2 achieved an R² value of 99.8%, incorporating additional variables for enhanced reliability.

Challenges & Recommendations:

  • Statistical assumptions like homoscedasticity, normality, and independence were partially violated, suggesting the need for:
    • Data transformations or additional predictors.
    • Exploration of advanced regression techniques to address heteroscedasticity and autocorrelation.
  • Despite a high R² value, further refinement is necessary to enhance model accuracy and reliability.

Tools Used:

  • Python for data analysis and modeling.
  • Matplotlib and Seaborn for data visualization.

Impact:

The project provided valuable insights into energy consumption patterns, suggesting opportunities for optimization and environmental sustainability.

Insights were shared through visualizations and a detailed analytical report.

See Complete Project Here

  • Data Analysis
  • Energy Optimization
  • Python & Visualization

Visualisations

Total Power Consumption in Different Sub_metering Areas

Heatmap Matrix