topic Re: how to update not tracked column only in new row version in create_auto_cdc_flow in Data Engineering

how to update not tracked column only in new row version in create_auto_cdc_flow

rplazaman — Tue, 31 Mar 2026 16:46:48 GMT

Hi,
I'm using create_auto_cdc_flow, scd type 2. In source I have a metadata which tells the origin of the row. This column should not trigger new version row, so it is added to track_history_except_column_list.
I don't want to add it to exception column list, because once a new version of row is created I would like this value to be added together with new version of the row.

However there is an unwanted behavior, if this value is changed, and other values not (now new row created) then in target latest version of the row this value being updated to the value from source, but I would like this value in target to stay as it is and only be added if new version of row is created.

Does anybody knows how to achieve this goal?

Re: how to update not tracked column only in new row version in create_auto_cdc_flow

lingareddy_Alva — Tue, 31 Mar 2026 17:34:42 GMT

@rplazaman

This is a well-known limitation of create_auto_cdc_flow / AUTO CDC INTO — and unfortunately there is no native way to achieve exactly what you want within the API's parameters. Here's why, and what you can do about it:

The Core Problem
The track_history_except_column_list behavior is a binary choice:

In the list → column change does NOT trigger a new version, and the current active row is updated in-place with the new value (SCD1-like behavior for that column)
Not in the list → column change triggers a new version

There is no third option like "don't trigger a new version, but also don't update in-place."
https://docs.databricks.com/aws/en/ldp/cdc

Workarounds
Strip the metadata column from the source before CDC
Pre-process your source view to exclude the metadata column entirely from the CDC flow input, then join it back after the fact using a separate pipeline step.

@dp.view
def source_for_cdc():
# Drop the metadata column so CDC never sees it
return spark.readStream.table("raw_source").drop("origin_metadata")

dp.create_auto_cdc_flow(
target="target_history",
source="source_for_cdc",
keys=["id"],
sequence_by="timestamp",
stored_as_scd_type="2",
# origin_metadata is simply absent - no tracking concerns
)

Then enrich the target with origin metadata in a downstream table by joining on the key + __START_AT, matching it against a separate table that tracks (key, sequence, origin_metadata).

Strip the metadata column from CDC entirely, keep it in a side table, and join it back in a downstream node. This keeps the CDC flow semantically clean (it only tracks what it should track) and gives you full control over when and how origin_metadata appears on versioned rows.
There is no native track_history_except_column_list + "freeze on no version change" behavior in the current API, so any solution requires pre- or post-processing outside the CDC flow itself.

Thanks.

Re: how to update not tracked column only in new row version in create_auto_cdc_flow

rplazaman — Tue, 31 Mar 2026 18:06:42 GMT

This is what I was afraid, but thanks for full explanation and workaround.