I recently built a Theoretical Solar Flare Grid Impact Intelligence System for the Databricks Free Edition Hackathon 2025, and I wanted to share my journey building an end-to-end data engineering and ML solution on Databricks Free Edition.
Finding the Problem to Solve
This isn't often the way round most people are used to working - usually a problem presents itself and you're tasked with finding the solution. However being a hackathon with a broad scope I thought I'd flip that and find a problem to solve. As a data engineer I love to solve problems, understand the core concepts of what ever is it I'm looking at and build a full picture. It's this aspect that also draws me to physics in my spare time so I thought I'd use this hackathon as way to dive into some data I find genuinly interesting - solar flares. I'm not an expert on solar flares but with some knowledge that solar flares can wreak havoc on electricals - especially with aging grid infrastructure, I wondered "what if we could predict these events days in advance and prepare for outages?" - et voila, the problem to solve.
Buidling the Solution to the Problem
The Soluiton
A system that combines NASA space weather data with power grid monitoring data to provide predictive intelligence for grid operators using natural language query interface.
Architecture Overview
Delta Live Tables Pipeline (Medallion Architecture):
๐ฅ Bronze Layer:
- Ingested NASA space weather observations (solar flare class, intensity, timing)
- Ingested power grid fault detection data (voltage, current, temperature, health scores)
- Used Auto Loader for streaming data ingestion
๐ฅ Silver Layer:
- Implemented data quality expectations with @Dlt.expect_or_drop()
- Validated timestamps, flare classifications, voltage ranges, temperature limits
- Enriched data with severity classifications and temporal features
๐ฅ Gold Layer:
- Created correlation tables joining solar and grid data by date
- Added temporal lag features (same-day, next-day, 2-3 days later) to capture delayed geomagnetic effects
- Built ML-enriched tables with predictions and probability forecasts
Key Features
1. Correlation Analysis:
- Tracked how solar flare intensity correlates with grid voltage drops, temperature rises, and equipment health degradation
- Implemented time-lagged features since geomagnetic storms take 24-72 hours to fully develop
2. ML Predictions:
- Created scenario-based predictions: Quiet Sun โ Severe X-class storms
- Generated 7-day forecasts with risk levels and specific operational recommendations
3. Probabilistic Forecasting:
- Calculated historical frequency of different flare classes (B, C, M, X)
- Created probability forecasts for next 7 days
4. AI/BI Genie Integration:
- Set up natural language query interface for grid operators
- Sample queries:
- "What happens if we get a severe X-class solar storm tomorrow?"
- "Show me the most likely solar scenarios for the next 7 days"
- "At what flare intensity should we activate emergency protocols?"
- "Visualize daily faults and types"
Here's what the theoretical grid operators would get:
- 7-day forecasts that show probability estimates for different solar scenarios (from quiet sun to severe X-class storms)
- Clear risk thresholds - no guessing about when to escalate from "keep an eye on it" to "activate emergency protocols"
- Specific action plans - not vague warnings, but concrete steps like "pre-position repair crews at substations" or "alert hospitals about potential outages"
- Anomaly detection that flags unusual patterns - days when something weird is happening that needs investigation
- Natural language queries via Genie - operators can ask questions in plain English and get instant answers
Tech Stack
Built entirely on Databricks Free Edition using:
- Delta Live Tables for the pipeline orchestration
- Auto Loader for streaming data ingestion
- PySpark for data transformations
- AI/BI Genie for natural language queries
- Python ML libraries (RandomForest) for the predictive modeling
The Journey: Finding Problems Worth Solving
This hackathon gave me the freedom to work backwards - starting with fascinating data (solar flares) and discovering a problem worth solving (grid vulnerability). It's not the typical workflow, but it reminded me why I became a data engineer in the first place: curiosity about how systems work and the drive to build solutions that matter.
You can watch the 5 minute demo that I entered into the Hackathon here:
Databricks Free Edition Hackathon: Theoretical Solar Flare Grid Impact Intelligence System
Thanks for reading!