topic replicate the behaviour of DLT create auto cdc flow in Data Engineering

replicate the behaviour of DLT create auto cdc flow

hidden — Mon, 08 Dec 2025 12:49:13 GMT

I want to custom write the behaviour of DLT create auto cdc flow . how can we do it

Re: replicate the behaviour of DLT create auto cdc flow

nayan_wylde — Mon, 08 Dec 2025 16:06:19 GMT

As of late 2025, Databricks’ Lakeflow Spark Declarative Pipelines (SDP) introduced create_auto_cdc_flow() (Python) and AUTO CDC ... INTO (SQL), which replace the older DLT apply_changes API and let you customize the CDC behavior declaratively—keys, sequencing, delete/truncate handling, SCD1 vs SCD2, column-level history, null-update rules, and more.
https://docs.databricks.com/aws/en/ldp/cdc

Re: replicate the behaviour of DLT create auto cdc flow

Poorva21 — Mon, 08 Dec 2025 16:41:36 GMT

create_auto_cdc_flow() is the new API replacing DLT apply_changes(), used to build declarative CDC pipelines on Delta Change Data Feed (CDF). It ingests inserts, updates, and deletes from a CDC source and applies them into a target streaming table you define. You specify keys (PK), sequence_by (event ordering), and customize behavior like null-handling, delete logic, truncation logic, column filtering, and SCD Type 1 or 2 storage. Deletes can be interpreted via apply_as_deletes, which uses temporary tombstones with configurable retention. Full table truncation can be triggered using apply_as_truncates (SCD Type 1 only).You can include/exclude specific columns and configure which columns track history. SCD2 requires the target table to include special columns __START_AT and __END_AT with matching type to sequence_by. Supports once=True for backfills (runs as batch). Works only with target streaming tables created using create_streaming_table().

https://docs.databricks.com/aws/en/ldp/developer/ldp-python-ref-apply-changes

Re: replicate the behaviour of DLT create auto cdc flow

Hubert-Dudek — Mon, 08 Dec 2025 16:50:12 GMT

And you need to handle dozens of exceptions, such as late-arriving data, duplicate data, data in the wrong order, etc.