- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2025 04:49 AM
I want to custom write the behaviour of DLT create auto cdc flow . how can we do it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2025 08:06 AM
As of late 2025, Databricks’ Lakeflow Spark Declarative Pipelines (SDP) introduced create_auto_cdc_flow() (Python) and AUTO CDC ... INTO (SQL), which replace the older DLT apply_changes API and let you customize the CDC behavior declaratively—keys, sequencing, delete/truncate handling, SCD1 vs SCD2, column-level history, null-update rules, and more.
https://docs.databricks.com/aws/en/ldp/cdc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2025 08:41 AM
create_auto_cdc_flow() is the new API replacing DLT apply_changes(), used to build declarative CDC pipelines on Delta Change Data Feed (CDF). It ingests inserts, updates, and deletes from a CDC source and applies them into a target streaming table you define. You specify keys (PK), sequence_by (event ordering), and customize behavior like null-handling, delete logic, truncation logic, column filtering, and SCD Type 1 or 2 storage. Deletes can be interpreted via apply_as_deletes, which uses temporary tombstones with configurable retention. Full table truncation can be triggered using apply_as_truncates (SCD Type 1 only).You can include/exclude specific columns and configure which columns track history. SCD2 requires the target table to include special columns __START_AT and __END_AT with matching type to sequence_by. Supports once=True for backfills (runs as batch). Works only with target streaming tables created using create_streaming_table().
https://docs.databricks.com/aws/en/ldp/developer/ldp-python-ref-apply-changes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2025 08:50 AM
And you need to handle dozens of exceptions, such as late-arriving data, duplicate data, data in the wrong order, etc.
My blog: https://databrickster.medium.com/