cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Seeking Practical Example for Structured Streaming with Delta Tables in Medallion Architecture

JissMathew
Valued Contributor

Hi everyone,

Iโ€™m working on implementing Structured Streaming in Databricks to capture Change Data Capture (CDC) as part of a Medallion Architecture (Bronze, Silver, and Gold layers). While Microsoftโ€™s documentation provides a theoretical approach, Iโ€™m looking for hands-on examples or code snippets that youโ€™ve successfully used in a real-world project.

Specifically, Iโ€™d like to understand:

  1. How to ingest data into a Delta table (Bronze layer) using Auto Loader or another streaming method.
  2. How to process this data incrementally to create CDC and propagate changes to Silver and Gold layers.
  3. Any recommendations for configurations or optimizations to manage schema evolution and large datasets effectively.

If anyone has experience with this and can share practical examples or insights beyond the documentation, it would be greatly appreciated!

Thank you in advance!

Jiss Mathew
India .
1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @JissMathew ,

Do you have access to databricks academy? I believe in their data engineering track there's pleny of example notebooks.
Or you can try dbdemos. For example, here you can find demo notebook for autoloader

Databricks Autoloader (cloudfile)

If you'd like to test it on your databricks instance just do the following:

%pip install dbdemos
import dbdemosdbdemos.install('auto-loader')

For CDC pipeline you can use following:

CDC Pipeline With Delta | Databricks

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @JissMathew ,

Do you have access to databricks academy? I believe in their data engineering track there's pleny of example notebooks.
Or you can try dbdemos. For example, here you can find demo notebook for autoloader

Databricks Autoloader (cloudfile)

If you'd like to test it on your databricks instance just do the following:

%pip install dbdemos
import dbdemosdbdemos.install('auto-loader')

For CDC pipeline you can use following:

CDC Pipeline With Delta | Databricks

Hi @szymon_dybczak , Thank you very much. Your reply provided me with an excellent reference solution. I had been struggling with structured streaming, and your help was incredibly valuable and insightful.

Jiss Mathew
India .

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now