Databricks Community

zoe_unifeye · ‎11-13-2025

I recently built a Theoretical Solar Flare Grid Impact Intelligence System for the Databricks Free Edition Hackathon 2025, and I wanted to share my journey building an end-to-end data engineering and ML solution on Databricks Free Edition.

Finding the Problem to Solve

This isn't often the way round most people are used to working - usually a problem presents itself and you're tasked with finding the solution. However being a hackathon with a broad scope I thought I'd flip that and find a problem to solve. As a data engineer I love to solve problems, understand the core concepts of what ever is it I'm looking at and build a full picture. It's this aspect that also draws me to physics in my spare time so I thought I'd use this hackathon as way to dive into some data I find genuinly interesting - solar flares. I'm not an expert on solar flares but with some knowledge that solar flares can wreak havoc on electricals - especially with aging grid infrastructure, I wondered "what if we could predict these events days in advance and prepare for outages?" - et voila, the problem to solve.

Buidling the Solution to the Problem

The Soluiton

A system that combines NASA space weather data with power grid monitoring data to provide predictive intelligence for grid operators using natural language query interface.

Architecture Overview

Delta Live Tables Pipeline (Medallion Architecture):

🥉 Bronze Layer:

Ingested NASA space weather observations (solar flare class, intensity, timing)
Ingested power grid fault detection data (voltage, current, temperature, health scores)
Used Auto Loader for streaming data ingestion

🥈 Silver Layer:

Implemented data quality expectations with @Dlt.expect_or_drop()
Validated timestamps, flare classifications, voltage ranges, temperature limits
Enriched data with severity classifications and temporal features

🥇 Gold Layer:

Created correlation tables joining solar and grid data by date
Added temporal lag features (same-day, next-day, 2-3 days later) to capture delayed geomagnetic effects
Built ML-enriched tables with predictions and probability forecasts

Key Features

1. Correlation Analysis:

Tracked how solar flare intensity correlates with grid voltage drops, temperature rises, and equipment health degradation
Implemented time-lagged features since geomagnetic storms take 24-72 hours to fully develop

2. ML Predictions:

Created scenario-based predictions: Quiet Sun → Severe X-class storms
Generated 7-day forecasts with risk levels and specific operational recommendations

3. Probabilistic Forecasting:

Calculated historical frequency of different flare classes (B, C, M, X)
Created probability forecasts for next 7 days

4. AI/BI Genie Integration:

Set up natural language query interface for grid operators
Sample queries:
- "What happens if we get a severe X-class solar storm tomorrow?"
- "Show me the most likely solar scenarios for the next 7 days"
- "At what flare intensity should we activate emergency protocols?"
- "Visualize daily faults and types"

Here's what the theoretical grid operators would get:

7-day forecasts that show probability estimates for different solar scenarios (from quiet sun to severe X-class storms)
Clear risk thresholds - no guessing about when to escalate from "keep an eye on it" to "activate emergency protocols"
Specific action plans - not vague warnings, but concrete steps like "pre-position repair crews at substations" or "alert hospitals about potential outages"
Anomaly detection that flags unusual patterns - days when something weird is happening that needs investigation
Natural language queries via Genie - operators can ask questions in plain English and get instant answers

Tech Stack

Built entirely on Databricks Free Edition using:

Delta Live Tables for the pipeline orchestration
Auto Loader for streaming data ingestion
PySpark for data transformations
AI/BI Genie for natural language queries
Python ML libraries (RandomForest) for the predictive modeling

The Journey: Finding Problems Worth Solving

This hackathon gave me the freedom to work backwards - starting with fascinating data (solar flares) and discovering a problem worth solving (grid vulnerability). It's not the typical workflow, but it reminded me why I became a data engineer in the first place: curiosity about how systems work and the drive to build solutions that matter.

You can watch the 5 minute demo that I entered into the Hackathon here:

Databricks Free Edition Hackathon: Theoretical Solar Flare Grid Impact Intelligence System

Thanks for reading!