Re: create_auto_cdc_from_snapshot_flow vs create_a...

aleksandra_ch · ‎01-27-2026

For your case, I recommend using create_auto_cdc_from_snapshot_flow(). Since your system provides full snapshots without row-level operation data, this is the only way to accurately generate SCD tables.

How it works: It compares the new snapshot to the target to identify changes:

New keys → INSERT
Existing keys with different values → UPDATE
Keys missing from the snapshot but present in target → DELETE

Implementation Details:

The lambda function is necessary only if there are multiple historical snapshots in the landing zone to be processed.

Processing History: If you have multiple historical snapshots in your landing zone, you'll need a lambda function to tell the flow how to order them.
Periodic Snapshots: If the source simply overwrites the old snapshot with a new one each day, you can just pass the path or table name directly.

Performance Note: Becausecreate_auto_cdc_from_snapshot_flow() requires a full scan of every snapshot, it can be heavy on large datasets. If the source system eventually gains the ability to provide row-level logs (CDC), it's better to switch to create_auto_cdc_flow() for better performance.

Hope this helps!