cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Handle updates from bronze to silver table stream

Trilleo
New Contributor II

Hi Databricks Community, 

 

I am trying to stream from a bronze to a silver table, however, I have the problem that there may be updates in the bronze table. Delta table streaming reads and write does not support skipChangeCommits=false, i.e. handle modified records. 

I need to be able to handle updates in the bronze table and then update records in the silver table based on this. My issue is that I might have a value which is re-ingested into Databricks, where only a subset of the columns are updated and I do not wish to add another entry. 

TLDR; How do I handle updates in the Bronze Table (source) in the Silver Table (target) when delta table streaming? 

 

1 ACCEPTED SOLUTION

Accepted Solutions

tyler-xorbix
New Contributor II

Hi Trilleo,

By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.

However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.

CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.

Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.

This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.

Let me know if you need further clarification or assistance!

View solution in original post

4 REPLIES 4

tyler-xorbix
New Contributor II

Hi Trilleo,

By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.

However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.

CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.

Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.

This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.

Let me know if you need further clarification or assistance!

Trilleo
New Contributor II

Hi Tyler,

Thanks for the quick response. That was going to be next approach as well. I had also come across the article you refer to, however, we are currently using "plain" pyspark in Notebooks and not delta live tables, so I am not sure it can be directly used, but I am certain I can get some inspiration.

 Thank you. 

Trilleo
New Contributor II

I quick follow up if anyone else see this post, I found this articles to help:
How to Simplify CDC With Delta Lake's Change Data Feed 

Use Delta Lake change data feed on Databricks 

Himali_K
New Contributor II

Hi, 

You can use dlt apply changes to deal with changing source.

Delta Live Tables Python language reference | Databricks on AWS

Thank you

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!