Hello,
I'm working on a Delta Live Tables pipeline and need help with a data source challenge.
My source tables are batch-loaded SCD2 tables with CDF (Change Data Feed) enabled. These tables are updated daily using a complete overwrite operation.
For my DLT pipeline, I need to process the last 10 days of data and access the CDF metadata columns (_change_type, _commit_version, _commit_timestamp).
I've tried reading the source table as a stream. However, this fails with the error:
[DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example WRITE (Map(mode -> Overwrite, ...))) in the source table at version 438. This is currently not supported.
The error message suggests two options:
skipChangeCommits: Setting this option to true allows the stream to continue, but it ignores the overwrite changes, so my pipeline misses the new data.
restart with fresh checkpoint: This is not a viable option for a continuous DLT pipeline.
How can I get the CDF metadata for these batch source tables that are being overwritten, without having to manually restart the pipeline or lose data?
Is there a recommended pattern in DLT to handle this specific scenario?
Thanks in advance for your help!