I am deciding between create_auto_cdc_from_snapshot_flow() and create_auto_cdc_flow() in a pipeline.
My source is a daily full snapshot table:
create_auto_cdc_from_snapshot_flow() fits this model, but it requires the source lambda returning (DataFrame, snapshot_version), which feels heavy to implement compared to just producing CDC rows and using create_auto_cdc_flow().
So the question is:
For a system that only provides full daily snapshots (no row-level operations), what are the real technical advantages of using create_auto_cdc_from_snapshot_flow()?
Is snapshot-based AUTO CDC mainly a convenience API, or does it give better correctness, SCD2 handling, or performance guarantees than create_auto_cdc_flow() approach?