- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2026 06:20 AM - edited 01-27-2026 06:20 AM
Hi @batch_bender ,
For your case, I recommend using create_auto_cdc_from_snapshot_flow(). Since your system provides full snapshots without row-level operation data, this is the only way to accurately generate SCD tables.
How it works: It compares the new snapshot to the target to identify changes:
-
New keys →
INSERT -
Existing keys with different values →
UPDATE -
Keys missing from the snapshot but present in target →
DELETE
Implementation Details:
The lambda function is necessary only if there are multiple historical snapshots in the landing zone to be processed.
-
Processing History: If you have multiple historical snapshots in your landing zone, you'll need a lambda function to tell the flow how to order them.
-
Periodic Snapshots: If the source simply overwrites the old snapshot with a new one each day, you can just pass the path or table name directly.
Performance Note: Becausecreate_auto_cdc_from_snapshot_flow() requires a full scan of every snapshot, it can be heavy on large datasets. If the source system eventually gains the ability to provide row-level logs (CDC), it's better to switch to create_auto_cdc_flow() for better performance.
Hope this helps!