- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday - last edited Tuesday
This is a well-known limitation of create_auto_cdc_flow / AUTO CDC INTO — and unfortunately there is no native way to achieve exactly what you want within the API's parameters. Here's why, and what you can do about it:
The Core Problem
The track_history_except_column_list behavior is a binary choice:
In the list → column change does NOT trigger a new version, and the current active row is updated in-place with the new value (SCD1-like behavior for that column)
Not in the list → column change triggers a new version
There is no third option like "don't trigger a new version, but also don't update in-place."
https://docs.databricks.com/aws/en/ldp/cdc
Workarounds
Strip the metadata column from the source before CDC
Pre-process your source view to exclude the metadata column entirely from the CDC flow input, then join it back after the fact using a separate pipeline step.
@dp.view
def source_for_cdc():
# Drop the metadata column so CDC never sees it
return spark.readStream.table("raw_source").drop("origin_metadata")
dp.create_auto_cdc_flow(
target="target_history",
source="source_for_cdc",
keys=["id"],
sequence_by="timestamp",
stored_as_scd_type="2",
# origin_metadata is simply absent - no tracking concerns
)
Then enrich the target with origin metadata in a downstream table by joining on the key + __START_AT, matching it against a separate table that tracks (key, sequence, origin_metadata).
Strip the metadata column from CDC entirely, keep it in a side table, and join it back in a downstream node. This keeps the CDC flow semantically clean (it only tracks what it should track) and gives you full control over when and how origin_metadata appears on versioned rows.
There is no native track_history_except_column_list + "freeze on no version change" behavior in the current API, so any solution requires pre- or post-processing outside the CDC flow itself.
Thanks.