IoT Health Monitoring Dashboard for Device Uptime & Maintenance

Client: Small-scale manufacturer of digital screen devices

Tools Used: Google BigQuery, SQL, Power BI

Type: End-to-end data pipeline and dashboard build

Status: Phase 1 completed — first version delivered for stakeholder feedback

Background

The client operates six digital screens installed across two locations. These devices send IoT health data every 5 minutes, capturing metrics like battery levels, CPU and screen temperature, RAM usage, connectivity, and content playback status. Despite this high-frequency data, the client has no clear visibility into the health of their devices. They often find out about problems only when customers report that a screen is not functioning.

✅ Task Definition

Build a full-stack data solution that allows:

Management to monitor the overall health and uptime of the device fleet across locations.
Maintenance to identify individual device issues in real-time, spot recurring problems, and transition from reactive to proactive maintenance.

The solution must:

Ingest and transform raw IoT logs.
Detect incidents (e.g., overheating, low battery, offline, or not playing content).
Track uptime and severity scores.
Visualize key metrics through actionable dashboards.

🔍 Summary - Insights & Next Steps

This project demonstrates how structured cloud analytics and clear dashboards can turn raw IoT data into actionable insight. By designing a scalable data model in BigQuery and building interactive dashboards in Power BI, I enabled the client to monitor device uptime, detect root-cause issues, and support both management and maintenance with tailored views. The first version already highlights key failure patterns — such as overheating or low battery — laying the foundation for proactive maintenance.

Next, I will present this phase-1 pipeline to the client to validate whether it answers their key business questions, learn what they like or dislike, and identify any new questions that arise. Real insight often begins when users see their data — and this first iteration opens the door to a more focused and collaborative development process. Future steps include refining incident logic, modifying risk scoring becoming more predictive, and automating alerts to move from reactive fixes to preventative action.

Quick overview of the dashboards created as phase 1 deliverable to continue discussion with client.

Approach & Strategy

This project was treated as the first iteration in a long-term data improvement initiative. The idea was to build a working pipeline quickly, start answering key business questions, and open a feedback loop with stakeholders to improve the model iteratively.

Steps Taken:

Data Ingestion & Storage: IoT logs were loaded into Google BigQuery as a partitioned fact table, ensuring scalability.
Transformations: SQL was used to:

Normalize and enrich the logs with status flags and severity scores.
Detect and merge incident windows using time-aware logic (e.g., gaps ≤ 10 min).
Aggregate daily metrics like uptime % and device health scores.

Data Modeling: Dimensional model created with fact table and supporting dimension tables (devices, dates, locations, incidents).
Visualization: Power BI dashboards built for two personas:

Management View: Location-level health, fleet overview, uptime trends.
Maintenance View: Critical device list, incident types, screen status timeline.

Validation: DAX metrics were tested against SQL outputs for consistency; filters and drilldowns verified.

🧠 Why This Matters

To the end customer, the screen either works — or it doesn’t. But for the manufacturer, early signs like high temperature, low battery, or lack of playback provide critical warning signs. This project is about:

Bringing visibility into the invisible.
Saving time and money through proactive issue detection.
Building a scalable foundation for predictive alerting and automation.

🔁 Next Steps

Refine incident detection logic further.
Improve severity scoring (rebranded as “Health Score” for better communication).
Introduce automatic alerts via Power BI
Engage with maintenance and management to prioritize key views and metrics.
Begin building predictive models once enough history is gathered.

🔍 Python Data Exploration

To understand the behavior of each screen device and assess data quality, I began with Python-based exploratory analysis using Pandas, Matplotlib, and Seaborn.

📉 Data Frequency & Gaps

Each device is expected to report every 5 minutes. But this isn’t always the case:

Row counts vary significantly across devices.
Gaps >15 minutes reveal irregular logging and potential connectivity or battery issues.

These irregularities directly affect how we define downtime and merge incident periods later in the pipeline.

📈 Device Behavior Over Time

Visualizing time-series trends for key attributes like RAM usage and CPU temperature showed clear behavioral differences between devices:

This suggests that applying a single, fixed threshold across all devices may lead to inaccurate incident detection. A more robust approach could be to:

Learn normal operating ranges per device
Flag deviations from individual baselines as potential anomalies

🔋 Battery Level as Early Indicator

Battery level emerged as a highly valuable and interpretable metric:

When below 20% and not charging, the device is at risk of shutting down

Over time, battery depletion patterns could signal battery health decline.

In future iterations, I could build a “time to battery death” prediction using discharge rates.

📺 Screen Content Status = Ground Truth

The most critical signal for client experience is whether the screen is actually showing content.

This binary status (content_playing = TRUE/FALSE) was visualized using a red/green timeline — a clear, visual cue of when devices are doing their job vs. when they’re failing.

This signal became my primary proxy for uptime during this first iteration of the pipeline.

🧠 Learnings & Next Steps

So far, this was light-touch exploration to:

Understand device behavior patterns
Spot quality issues in the data
Identify reliable early signals of failure

In future iterations, I plan to:

Use correlation & regression tests to uncover predictive patterns
Tune thresholds per device
Explore multi-metric incident scoring
Enable smarter alerting based on combined risk signals

Environment

🧠 BigQuery Environment & Data Modeling

I uploaded the raw IoT log data to Google BigQuery, ensuring automatic schema detection correctly handled the field types (e.g., timestamp, BOOLEAN, FLOAT64, etc.).

From there, I built a set of structured, query-ready tables to support both analysis and dashboarding. These are outlined in detail in the Data Model Documentation, which includes table definitions, SQL logic, and foreign key relationships.

📎 The full data model document is available behind this link 👉 DATA MODEL DOCUMENTATION

🗃️ Core Tables

Key components of the BigQuery environment include:

✅ FACT_iot_device_health_logs

(Fact Table)

Granular base table, partitioned by date
Contains all raw IoT readings: battery level, CPU temp, RAM, network status, etc.
Serves as the foundation for most metrics

✅ dim_device_incidents

Contains one row per incident period
Incidents are merged by logic to handle gaps/missing logs
Includes both incident type and average attributes during the period (e.g., avg battery, RAM)
Enables clean incident reporting and root-cause analysis

✅ dim_device_current_status

A view of the latest record per device
Powers dashboard elements for current fleet health
Used by both management and maintenance personas

✅ dim_devices, dim_locations, dim_dates

Standard dimension tables for consistent filtering and slicers in Power BI

🧱 A visual schema of this model is included below.

☁️ Cloud-Based Processing

This structure keeps heavy transformation logic in BigQuery, which:

Ensures scalability as data volume grows
Reduces load on Power BI refreshes
Keeps logic centralized and transparent

📝 Summary

BigQuery serves as the central data warehouse in this project — hosting clean, structured, and optimized tables. This allows Power BI to focus on visualization and interaction, while all complex calculations (e.g., incident logic, uptime%, severity scoring) are handled in SQL.

✅ This separation of concerns results in a fast, scalable, and maintainable solution — ready to grow with future needs.

Power BI Dashboards

After preparing the data in BigQuery, I connected Power BI directly to the dataset using the native BigQuery connector. This allowed me to pull only curated tables and views, keeping the model clean and fast to refresh.

📈 Persona-Based Dashboards

Two dashboard pages were designed, each addressing the goals of a specific stakeholder group:

👨‍💼 Management Dashboard

Focus:

Uptime by location
Severity score per device
Current health overview
Location-level summaries

Visuals:

KPI cards - Latest device statuses based on content showing or not
Count of incidents per location/device
Location/Device Health Index (aggregated metric based on devices status to determine a health score)

Uptime % per device on selected day

🧑‍🔧 Maintenance Dashboard