Hi everyone,
I’m working on implementing Structured Streaming in Databricks to capture Change Data Capture (CDC) as part of a Medallion Architecture (Bronze, Silver, and Gold layers). While Microsoft’s documentation provides a theoretical approach, I’m looking for hands-on examples or code snippets that you’ve successfully used in a real-world project.
Specifically, I’d like to understand:
- How to ingest data into a Delta table (Bronze layer) using Auto Loader or another streaming method.
- How to process this data incrementally to create CDC and propagate changes to Silver and Gold layers.
- Any recommendations for configurations or optimizations to manage schema evolution and large datasets effectively.
If anyone has experience with this and can share practical examples or insights beyond the documentation, it would be greatly appreciated!
Thank you in advance!