Client Background

Client Name: A Leading IoT Tech firm in the USA

Industry Type:  Energy Technology (Industrial IoT–enabled Smart Energy Solutions)

Products & Services: Smart, connected lithium-ion batteries and energy storage systems platform delivers a seamless transition from outdated power systems to smart, sustainable, lithium-powered operations.

Organization Size: 100+

The Problem

In large-scale industrial environments, IoT-enabled machines generate continuous telemetry data that is distributed across multiple Kafka topics and often contains duplicates, inconsistencies, and incomplete records. The absence of a scalable and automated real-time ETL pipeline makes it difficult to aggregate, clean, and manage both streaming and historical data efficiently, as well as to store it reliably in cloud storage for downstream analytics. 

Therefore, there is a need for a robust real-time ETL system that can continuously ingest IoT data from Kafka, perform data cleaning and deduplication, and securely store the processed data in AWS S3 and MySql for analysis and visualization.

Our Solution

The solution uses a real-time ETL pipeline built with Kafka and Python, integrated with AWS S3 and MySQL. Data is ingested from multiple Kafka topics generated by two BMS devices, including both static and dynamic telemetry data with nested JSON structures. The Python-based ETL process performs data cleaning and deduplication, removing static duplicates based on HWID and dynamic duplicates only when all fields match. The processed data is stored as CSV files in topic-wise folders in AWS S3 and simultaneously loaded into a MySQL database for efficient querying and analytics.

Solution Architecture

Deliverables

Grafana Realtime Monitoring Dashboard
Overview Dashboard
Equipments Dashboard

Form for registration of new equipments

Delete/Update operation on form data

Tech Stack

  • Tools used
  • Grafana
  • Language/techniques used
  • Python, SQL, Visualization, Data Analytics
  • Models used
  • No models used
  • Skills used
  • Data Extraction, Data Transformation, Data Loading, Visualization, Data Analytics
  • Databases used
  • MySQL
  • Web Cloud Servers used
  • Ubantu VM Instance

What are the technical Challenges Faced during Project Execution

One of the primary technical challenges was developing a robust Python ETL script to efficiently extract, transform, and load data into Amazon S3 and MySQL while ensuring data accuracy and consistency. Optimizing the script to minimize memory usage and CPU consumption was crucial for handling large datasets and ensuring scalability.

Another challenge involved creating interactive data visualizations, including interlinked dashboards for seamless data exploration. Additionally, implementing a secure and scalable user registration process required careful handling of authentication, data validation, and secure storage of user credentials.

Overall, the project required efficient coding, performance optimization, and well-designed system architecture to deliver a reliable and user-friendly solution.

How the Technical Challenges were Solved

The technical challenges were addressed through thorough research and implementation of suitable solutions. To improve data visualization, various visualization options in Grafana were explored to create more interactive, user-friendly dashboards, along with HTML-based graphical visualizations for better presentation and usability.

Performance and data handling challenges were resolved by optimizing the Python ETL scripts for efficient data processing and resource usage. 

Business Impact

The implemented real-time ETL pipeline enables organizations to reliably process and manage high-volume IoT telemetry data with minimal latency. By ensuring clean, deduplicated, and well-structured data, it improves data accuracy and trust for analytics and reporting. Storing processed data in AWS S3 and MySQL enhances scalability, availability, and cost efficiency while supporting faster insights into device performance, fault detection, and operational trends. Overall, the solution reduces manual data handling, improves operational efficiency, and supports data-driven decision-making in large-scale industrial environments.

Project Snapshots

Project Video