- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
I have a Lakeflow Connect SCD1 pipeline for SQL Server where I get a mirror of what's live in the source database at the point of ingestion. Now I want to implement a process where I implement a downstream SCD2 table capturing changes for each ingestion. I could've done this by simply specifying SCD2 for my Lakeflow Connect pipeline but this is impossible due to the number of records changing in the source database and being ingested. The cost of storage and operations is simply not accepted by the business.
Now I can implement an SCD2 in an SDP using a delta sink with a forEachBatch function but I am after a native way of handling this scenario in a pipeline. My main issue is to tag deleted records in the target table similar to whenNotMatchedBySource in a merge statement. I came across apply_as_deletes in create_auto_cdc_flow but this is not what I need as there is no column that indicates a record has been deleted in my source table and therefore needs to be updated in my target with an "operation" value. Is there a native way of handling this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
This is exactly the scenario apply_changes_from_snapshot was designed for. It compares consecutive full snapshots and automatically derives inserts, updates, and deletes by absence no delete indicator column needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
Hi Sameer. This is exactly what I came across after my post and seem to be exactly doing what I need. Thanks for confirming my understanding.