replicate the behaviour of DLT create auto cdc flow

hidden
New Contributor II

I want to custom write the behaviour of DLT create auto cdc flow . how can we do it 

 

nayan_wylde
Esteemed Contributor II

As of late 2025, Databricks’ Lakeflow Spark Declarative Pipelines (SDP) introduced create_auto_cdc_flow() (Python) and AUTO CDC ... INTO (SQL), which replace the older DLT apply_changes API and let you customize the CDC behavior declaratively—keys, sequencing, delete/truncate handling, SCD1 vs SCD2, column-level history, null-update rules, and more.
https://docs.databricks.com/aws/en/ldp/cdc

View solution in original post

Poorva21
Contributor II

create_auto_cdc_flow() is the new API replacing DLT apply_changes(), used to build declarative CDC pipelines on Delta Change Data Feed (CDF). It ingests inserts, updates, and deletes from a CDC source and applies them into a target streaming table you define. You specify keys (PK), sequence_by (event ordering), and customize behavior like null-handling, delete logic, truncation logic, column filtering, and SCD Type 1 or 2 storage. Deletes can be interpreted via apply_as_deletes, which uses temporary tombstones with configurable retention. Full table truncation can be triggered using apply_as_truncates (SCD Type 1 only).You can include/exclude specific columns and configure which columns track history. SCD2 requires the target table to include special columns __START_AT and __END_AT with matching type to sequence_by. Supports once=True for backfills (runs as batch). Works only with target streaming tables created using create_streaming_table().

https://docs.databricks.com/aws/en/ldp/developer/ldp-python-ref-apply-changes

Hubert-Dudek
Databricks MVP

And you need to handle dozens of exceptions, such as late-arriving data, duplicate data, data in the wrong order, etc.


My blog: https://databrickster.medium.com/