cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

create_auto_cdc_from_snapshot_flow vs create_auto_cdc_flow – when is snapshot CDC actually worth it?

batch_bender
Visitor

I am deciding between create_auto_cdc_from_snapshot_flow() and create_auto_cdc_flow() in a pipeline.

My source is a daily full snapshot table:

  • No operation column (no insert/update/delete flags)

  • Order can be derived from snapshot_date (sequence by)
  • Rows are unique based on key id

create_auto_cdc_from_snapshot_flow() fits this model, but it requires the source lambda returning (DataFrame, snapshot_version), which feels heavy to implement compared to just producing CDC rows and using create_auto_cdc_flow().

So the question is:

For a system that only provides full daily snapshots (no row-level operations), what are the real technical advantages of using create_auto_cdc_from_snapshot_flow()?

Is snapshot-based AUTO CDC mainly a convenience API, or does it give better correctness, SCD2 handling, or performance guarantees than  create_auto_cdc_flow() approach?

0 REPLIES 0