cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Building a Theoretical Solar Flare Intelligence System for the Databricks Free Edition Hackathon

zoe_unifeye
New Contributor II

I recently built a Theoretical Solar Flare Grid Impact Intelligence System for the Databricks Free Edition Hackathon 2025, and I wanted to share my journey building an end-to-end data engineering and ML solution on Databricks Free Edition.

Finding the Problem to Solve

This isn't often the way round most people are used to working - usually a problem presents itself and you're tasked with finding the solution. However being a hackathon with a broad scope I thought I'd flip that and find a problem to solve. As a data engineer I love to solve problems, understand the core concepts of what ever is it I'm looking at and build a full picture. It's this aspect that also draws me to physics in my spare time so I thought I'd use this hackathon as way to dive into some data I find genuinly interesting - solar flares. I'm not an expert on solar flares but with some knowledge that solar flares can wreak havoc on electricals - especially with aging grid infrastructure, I wondered "what if we could predict these events days in advance and prepare for outages?" - et voila, the problem to solve.

Buidling the Solution to the Problem

The Soluiton

A system that combines NASA space weather data with power grid monitoring data to provide predictive intelligence for grid operators using natural language query interface.

Architecture Overview

Delta Live Tables Pipeline (Medallion Architecture):

๐Ÿฅ‰ Bronze Layer:

  • Ingested NASA space weather observations (solar flare class, intensity, timing)
  • Ingested power grid fault detection data (voltage, current, temperature, health scores)
  • Used Auto Loader for streaming data ingestion

๐Ÿฅˆ Silver Layer:

  • Implemented data quality expectations with @Dlt.expect_or_drop()
  • Validated timestamps, flare classifications, voltage ranges, temperature limits
  • Enriched data with severity classifications and temporal features

๐Ÿฅ‡ Gold Layer:

  • Created correlation tables joining solar and grid data by date
  • Added temporal lag features (same-day, next-day, 2-3 days later) to capture delayed geomagnetic effects
  • Built ML-enriched tables with predictions and probability forecasts

Key Features

1. Correlation Analysis:

  • Tracked how solar flare intensity correlates with grid voltage drops, temperature rises, and equipment health degradation
  • Implemented time-lagged features since geomagnetic storms take 24-72 hours to fully develop

2. ML Predictions:

  • Created scenario-based predictions: Quiet Sun โ†’ Severe X-class storms
  • Generated 7-day forecasts with risk levels and specific operational recommendations

3. Probabilistic Forecasting:

  • Calculated historical frequency of different flare classes (B, C, M, X)
  • Created probability forecasts for next 7 days

4. AI/BI Genie Integration:

  • Set up natural language query interface for grid operators
  • Sample queries:
    • "What happens if we get a severe X-class solar storm tomorrow?"
    • "Show me the most likely solar scenarios for the next 7 days"
    • "At what flare intensity should we activate emergency protocols?"
    • "Visualize daily faults and types"

Here's what the theoretical grid operators would get:

  • 7-day forecasts that show probability estimates for different solar scenarios (from quiet sun to severe X-class storms)
  • Clear risk thresholds - no guessing about when to escalate from "keep an eye on it" to "activate emergency protocols"
  • Specific action plans - not vague warnings, but concrete steps like "pre-position repair crews at substations" or "alert hospitals about potential outages"
  • Anomaly detection that flags unusual patterns - days when something weird is happening that needs investigation
  • Natural language queries via Genie - operators can ask questions in plain English and get instant answers

Tech Stack

Built entirely on Databricks Free Edition using:

  • Delta Live Tables for the pipeline orchestration
  • Auto Loader for streaming data ingestion
  • PySpark for data transformations
  • AI/BI Genie for natural language queries
  • Python ML libraries (RandomForest) for the predictive modeling

The Journey: Finding Problems Worth Solving

This hackathon gave me the freedom to work backwards - starting with fascinating data (solar flares) and discovering a problem worth solving (grid vulnerability). It's not the typical workflow, but it reminded me why I became a data engineer in the first place: curiosity about how systems work and the drive to build solutions that matter.

You can watch the 5 minute demo that I entered into the Hackathon here:

Databricks Free Edition Hackathon: Theoretical Solar Flare Grid Impact Intelligence System

Thanks for reading!  

In this video, Zoe Booth, Senior Data Engineer at Unifeye, walks through her full end-to-end solution built for the Databricks Free Edition Hackathon. Zoe joined Unifeye at the start of the month, and she's already making an incredible impact. This demo shows exactly why. What she built: A ...
1 REPLY 1

Raman_Unifeye
Contributor

Fabulous submission @zoe_unifeye and good luck with hackathon.