06-10-2024 03:11 AM
Hi Databricks Community,
I am trying to stream from a bronze to a silver table, however, I have the problem that there may be updates in the bronze table. Delta table streaming reads and write does not support skipChangeCommits=false, i.e. handle modified records.
I need to be able to handle updates in the bronze table and then update records in the silver table based on this. My issue is that I might have a value which is re-ingested into Databricks, where only a subset of the columns are updated and I do not wish to add another entry.
TLDR; How do I handle updates in the Bronze Table (source) in the Silver Table (target) when delta table streaming?
06-10-2024 11:30 AM
Hi Trilleo,
By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.
However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.
CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.
Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.
This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.
Let me know if you need further clarification or assistance!
06-10-2024 11:30 AM
Hi Trilleo,
By default, a Delta Live Table acts as a materialized view, meaning it automatically updates based on its dependencies. This functionality allows for straightforward handling of data changes and dependencies without additional manual intervention.
However, your scenario seems to involve a more complex situation. This requires creating custom merge logic with a Change Data Capture (CDC) feed.
CDC is a feature in Delta Lake that helps track changes (inserts, updates, and deletes) in a table. To handle updates from your bronze table and ensure they are accurately reflected in the silver table, you will need to implement custom merge logic. This involves reading the streaming data from the bronze table and applying merge operations to update or insert records into the silver table based on a key column.
Here is a Databricks Blog overviewing CDC with custom merge logic: Change Data Capture With Delta Live Tables - The Databricks Blog.
This approach ensures that updates in the bronze table are correctly reflected in the silver table without adding duplicate entries, providing a more tailored solution to handle your specific needs.
Let me know if you need further clarification or assistance!
06-10-2024 10:35 PM
Hi Tyler,
Thanks for the quick response. That was going to be next approach as well. I had also come across the article you refer to, however, we are currently using "plain" pyspark in Notebooks and not delta live tables, so I am not sure it can be directly used, but I am certain I can get some inspiration.
Thank you.
06-13-2024 02:41 AM
I quick follow up if anyone else see this post, I found this articles to help:
How to Simplify CDC With Delta Lake's Change Data Feed
06-13-2024 07:49 AM
Hi,
You can use dlt apply changes to deal with changing source.
Delta Live Tables Python language reference | Databricks on AWS
Thank you
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group