cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

๐Ÿ“Š Simplifying CDC with Databricks Delta Live Tables & Snapshots ๐Ÿ“Š

Ajay-Pandey
Esteemed Contributor III
In the world of data integration, synchronizing external relational databases (like Oracle, MySQL) with the Databricks platform can be complex, especially when Change Data Feed (CDF) streams arenโ€™t available. Using snapshots is a powerful way to manage this!

๐Ÿ”น What are Snapshots? Snapshots capture the state of your data at a given time, making it easier to track changes over time and maintain consistency in your data lake.
๐Ÿ”น SCD Type 1 & 2 Implementation Delta Live Tables (DLT) in Databricks simplifies handling Slowly Changing Dimensions (SCD) with two main approaches:
Snapshot Replacement: Overwrite the existing snapshot with a new one.
Snapshot Accumulation: Maintain multiple snapshots over time for a historical view.
DLTโ€™s APPLY CHANGES FROM SNAPSHOT feature streamlines processing these snapshots, allowing you to store records as SCD Type 1 (overwrite) or Type 2 (track historical changes).
๐Ÿ”น Push vs. Pull-Based Snapshots
Push-Based: Efficient and initiated directly from the source.
Pull-Based: More flexible but can be resource-intensive, ideal for large data sources.

๐Ÿ› ๏ธ Delta Live Tables Pipelines With DLT, you can efficiently process CDC data from full snapshots, applying logic to track changes in your data over time and support complex ETL pipelines.
๐Ÿ“Œ Whether you're managing customer data, tracking order history, or analyzing product changes, using snapshots in DLT with Databricks offers flexibility and performance.

Wanted to implement - How to perform change data capture (CDC) from full table snapshots using Delta Live Tables
 
 
Pull-Based Snapshots.png
Ajay Kumar Pandey
1 REPLY 1

BilalHaniff1
New Contributor II

Hi Ajay

Can apply changes into snapshot handle re-processing of an older snapshot? 

UseCase:

- Source has delivered data on day T, T1 and T2.  

- Consumers realise there is an error on the day T data, and make a correction in the source. The source redelivers the T data.  How will Apply changes into Snapshot handle this usecase?  Or how would you advise we handle this? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group