Hi everyone,
I'm running a DLT pipeline that loads data from Bronze to Silver using dlt.apply_changes(SCD type 2)
The first run of the pipeline worked fine -- data was written successfully into the target Silver tables.
However, when I ingested new data the next day and re-ran the pipeline, it failed with the following error:
Query [id = 85706ddc-02af-426c-9ba8-ab1da903b5c8, runId = 1507aa2e-c77e-4309-906b-1ed0afe25eed] terminated with exception: [DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE] The streaming query was reading from an unexpected Delta table (id = '77d0eb65-f733-4f7d-b6b5-5f7c25fc9264').
It used to read from another Delta table (id = '474c3622-d677-4882-8f31-ceb9275e90d9') according to checkpoint.
This may happen when you changed the code to read from a new table or you deleted and
re-created a table. Please revert your change or delete your streaming query checkpoint
to restart from scratch.
I understand this means the source Delta table ID has changed, but I didn't intentionally modify the table schema or logic.
It looks like the issue happens when dlt.apply_changes tries to update existing data on the second run.
Questions:
- What is the best practice to prevent this "unexpected Delta table ID" error?
- Is there a way to safely refresh or modify the source tables without breaking the streaming checkpoints?