cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Seeking Practical Example for Structured Streaming with Delta Tables in Medallion Architecture

JissMathew
New Contributor II

Hi everyone,

Iโ€™m working on implementing Structured Streaming in Databricks to capture Change Data Capture (CDC) as part of a Medallion Architecture (Bronze, Silver, and Gold layers). While Microsoftโ€™s documentation provides a theoretical approach, Iโ€™m looking for hands-on examples or code snippets that youโ€™ve successfully used in a real-world project.

Specifically, Iโ€™d like to understand:

  1. How to ingest data into a Delta table (Bronze layer) using Auto Loader or another streaming method.
  2. How to process this data incrementally to create CDC and propagate changes to Silver and Gold layers.
  3. Any recommendations for configurations or optimizations to manage schema evolution and large datasets effectively.

If anyone has experience with this and can share practical examples or insights beyond the documentation, it would be greatly appreciated!

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Contributor III

Hi @JissMathew ,

Do you have access to databricks academy? I believe in their data engineering track there's pleny of example notebooks.
Or you can try dbdemos. For example, here you can find demo notebook for autoloader

Databricks Autoloader (cloudfile)

If you'd like to test it on your databricks instance just do the following:

%pip install dbdemos
import dbdemosdbdemos.install('auto-loader')

For CDC pipeline you can use following:

CDC Pipeline With Delta | Databricks

View solution in original post

2 REPLIES 2

szymon_dybczak
Contributor III

Hi @JissMathew ,

Do you have access to databricks academy? I believe in their data engineering track there's pleny of example notebooks.
Or you can try dbdemos. For example, here you can find demo notebook for autoloader

Databricks Autoloader (cloudfile)

If you'd like to test it on your databricks instance just do the following:

%pip install dbdemos
import dbdemosdbdemos.install('auto-loader')

For CDC pipeline you can use following:

CDC Pipeline With Delta | Databricks

Hi @szymon_dybczak , Thank you very much. Your reply provided me with an excellent reference solution. I had been struggling with structured streaming, and your help was incredibly valuable and insightful.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group