cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Get CDF Metadata from an Overwritten Batch Source in DLT?

Ovasheli
New Contributor

Hello,

I'm working on a Delta Live Tables pipeline and need help with a data source challenge.

My source tables are batch-loaded SCD2 tables with CDF (Change Data Feed) enabled. These tables are updated daily using a complete overwrite operation.

For my DLT pipeline, I need to process the last 10 days of data and access the CDF metadata columns (_change_type, _commit_version, _commit_timestamp).

I've tried reading the source table as a stream. However, this fails with the error:

[DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example WRITE (Map(mode -> Overwrite, ...))) in the source table at version 438. This is currently not supported.

The error message suggests two options:

skipChangeCommits: Setting this option to true allows the stream to continue, but it ignores the overwrite changes, so my pipeline misses the new data.

restart with fresh checkpoint: This is not a viable option for a continuous DLT pipeline.

How can I get the CDF metadata for these batch source tables that are being overwritten, without having to manually restart the pipeline or lose data?

Is there a recommended pattern in DLT to handle this specific scenario?

Thanks in advance for your help!

1 REPLY 1

Sidhant07
Databricks Employee
Databricks Employee

Hi @Ovasheli ,

I believe the error message would be something like below.

Error:

com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example DELETE (Map(predicate - in the source table at version 166. This is currently not supported. If this is going to happen regularly and you are okay to skip changes, set the option 'skipChangeCommits' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory or do a full refresh if you are using DLT. If you need to handle these changes, please switch to MVs. The source table can be found at path abfss://oagdataproductdev@oagdevweussa.dfs.core.windows.net/__unitystorage/catalogs/a093d514-54ed-4254-a57a-c1b59f7d3efd/tables/2723f5df-35f5-4430-afcb-495298b8db4c.

As mentioned in the error message, using a new checkpoint or setting skipChangeCommits to true should help resolve the issue.

I handled similar issues in the past and we recommend this solutions to our users.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now