What if a data pipeline could explain why, it failed instead of just saying it failed? 👀
While learning Databricks and exploring Data Engineering, I built an AI Powered Autonomous Data Reliability Platform on Databricks Free Edition using:
🔹 PySpark
🔹 Delta Lake
🔹 Databricks Workflows & Dashboards
🔹 Metadata-driven validation framework
🔹 Gemini LLM integration for AI-powered root cause analysis
The platform dynamically validates large-scale data, detects anomalies, monitors pipeline quality, and generates intelligent remediation insights using Generative AI.
One of my favorite parts of this project was integrating Gemini LLM to transform traditional monitoring into an intelligent observability system 🚀
This project helped me learn:
- workflow orchestration
- scalable validation design
- AI integration in data engineering
- observability concepts
- Medallion Architecture using Databricks
Would love to hear your thoughts and feedback from the community!
GitHub Repository:
Som-115 (vaishnavi)
Demo Video:
https://drive.google.com/file/d/1-7s-idbJmSRdjPlSTPAy2tEsW_4mS-AQ/view?usp=sharing
#Databricks #DAIS2026 #DataEngineering #GenerativeAI #PySpark #DeltaLake #AI #LLM #DataObservability