cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Handle updates from bronze to silver table stream

Trilleo
New Contributor III

Hi Databricks Community, 

 

I am trying to stream from a bronze to a silver table, however, I have the problem that there may be updates in the bronze table. Delta table streaming reads and write does not support skipChangeCommits=false, i.e. handle modified records. 

I need to be able to handle updates in the bronze table and then update records in the silver table based on this. My issue is that I might have a value which is re-ingested into Databricks, where only a subset of the columns are updated and I do not wish to add another entry. 

TLDR; How do I handle updates in the Bronze Table (source) in the Silver Table (target) when delta table streaming? 

 

1 ACCEPTED SOLUTION

Accepted Solutions

tyler-xorbix
New Contributor III

Hi Trilleo,

By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.

However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.

CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.

Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.

This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.

Let me know if you need further clarification or assistance!

View solution in original post

4 REPLIES 4

tyler-xorbix
New Contributor III

Hi Trilleo,

By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.

However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.

CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.

Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.

This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.

Let me know if you need further clarification or assistance!

Trilleo
New Contributor III

Hi Tyler,

Thanks for the quick response. That was going to be next approach as well. I had also come across the article you refer to, however, we are currently using "plain" pyspark in Notebooks and not delta live tables, so I am not sure it can be directly used, but I am certain I can get some inspiration.

 Thank you. 

Trilleo
New Contributor III

I quick follow up if anyone else see this post, I found this articles to help:
How to Simplify CDC With Delta Lake's Change Data Feed 

Use Delta Lake change data feed on Databricks 

Himali_K
New Contributor II

Hi, 

You can use dlt apply changes to deal with changing source.

Delta Live Tables Python language reference | Databricks on AWS

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group