Streaming with Medalion Architchture and star schema Help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2025 11:53 PM
What are the best practices for implementing non-stop streaming in a Medallion Architecture with a Star Schema?
Use Case:
We have operational data and need to enable near real-time reporting in Power BI, with a maximum latency of 3 minutes. No Delta live tables.
Key Questions:
How should we curate dimensions and facts when transitioning data from Silver to Gold using Structured Streaming?
Could you provide examples or proven approaches for fact-dimension joins in a streaming context?
How can we use CDC in here?
In case of more questions and clarification happy to answer your questions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-04-2025 06:41 AM
Why not DLTs? This is kind of an ideal use case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-04-2025 09:21 AM
I've setup a near real-time (30-minute latency) streaming solution that ingests data from SQL Server into Delta Lake.
- Changes in the source SQL Server tables are captured using Change Data Capture (CDC) and written to CSV files in a data lake.
- A streaming process then reads these CSV files as they arrive and applies the changes to Delta tables.
- The delta (changes) is determined by comparing timestamps against a logging table.
- The entire process, from CDC extraction to streaming updates, is orchestrated using Databricks Workflows.
I recommend DLT as the best solution for your use case, else use the below traditional approach.
- Change Data Feed
- Structured Streaming
- 3-minute micro-batches

