🚀 LDP Tax Pipeline — Spark Declarative Pipelines on macOS (Without Databricks)
Excited to share my latest hands-on implementation of a LakeFlow Declarative Pipeline (LDP) built locally using Apache Spark 4.1 Declarative Pipelines — running entirely on macOS, orchestrated with a simple cron job, and following a production-style Medallion Architecture.
🔹 What this pipeline demonstrates:
📥 Raw tax data ingestion from AWS S3
🥉 Bronze layer for raw structured ingestion
🥈 Silver layer for cleansing & enrichment
🥇 Gold layer for aggregations & analytics
💾 Persistence to Azure Data Lake Storage Gen2
⏱️ Fully scheduled using cron (no external orchestrator)
🚫 No Databricks runtime — pure Spark 4.1 SDP
Check this medium article. https://medium.com/@bijumathewt/lakeflow-declarative-tax-pipeline-using-apache-spark-4-1-64c965914c3...