We 're currently working with Databricks AUTO CDC in a data pipeline and have encountered an issue with create_auto_cdc_flow (AUTO CDC) when using SCD Type 2.
We are using the following configuration:
stored_as_scd_type = 2
apply_as_deletes = expr("operation = 'DELETE'")
sequence_by = date_extraction
Target is a streaming table created via create_streaming_table
Our source data includes an operation column (INSERT, UPDATE, DELETE) and business fields such as account_close_dt (end date of account).
Observed Behavior
When a record arrives with:
operation = 'DELETE'
and a populated business field (e.g., account_close_dt )
We observe that:
The existing SCD2 record is closed (__END_AT is set correctly)
However, business columns such as account_close_dt are NOT updated in the target table
This results in loss of important business information associated with the closing event.
Expected Behavior
For our use case, we expect that:
When a "DELETE" event contains updated business fields (e.g., account closure date),
Those fields should be reflected in the closing record
Currently, neither occurs — only __END_AT is updated.
Key Question
Is this behavior expected when using apply_as_deletes?
Specifically:
When an event matches apply_as_deletes, is it always treated as a pure delete (i.e., only closing the record), ignoring all other column updates?
Is there any supported way within create_auto_cdc_flow to:
propagate business column updates along with delete events?
I would appreciate guidance on best practices for this pattern when using AUTO CDC.