cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakeflow SDP equivalent of whenNotMatchedBySource

mzare
New Contributor II

I have a Lakeflow Connect SCD1 pipeline for SQL Server where I get a mirror of what's live in the source database at the point of ingestion. Now I want to implement a process where I implement a downstream SCD2 table capturing changes for each ingestion. I could've done this by simply specifying SCD2 for my Lakeflow Connect pipeline but this is impossible due to the number of records changing in the source database and being ingested. The cost of storage and operations is simply not accepted by the business.

Now I can implement an SCD2 in an SDP using a delta sink with a forEachBatch function but I am after a native way of handling this scenario in a pipeline. My main issue is to tag deleted records in the target table similar to whenNotMatchedBySource in a merge statement. I came across apply_as_deletes in create_auto_cdc_flow but this is not what I need as there is no column that indicates a record has been deleted in my source table and therefore needs to be updated in my target with an "operation" value. Is there a native way of handling this 

1 ACCEPTED SOLUTION

Accepted Solutions

sameer_yasser
New Contributor II

This is exactly the scenario apply_changes_from_snapshot was designed for. It compares consecutive full snapshots and automatically derives inserts, updates, and deletes by absence no delete indicator column needed.

View solution in original post

2 REPLIES 2

sameer_yasser
New Contributor II

This is exactly the scenario apply_changes_from_snapshot was designed for. It compares consecutive full snapshots and automatically derives inserts, updates, and deletes by absence no delete indicator column needed.

mzare
New Contributor II

Hi Sameer. This is exactly what I came across after my post and seem to be exactly doing what I need. Thanks for confirming my understanding.