The most likely reason your DLT pipeline shows 50K updates but the records remain unchanged is related to how Delta Live Tables (DLT) handle streaming tables, update logic, and schema constraints. When the target table uses an auto-increment ID (especially if it is a surrogate key in a dimension table), DLT may not update the row if the update logic (MERGE, APPLY CHANGES, etc.) does not result in a discernible data change according to the join condition, or if the table/mode is configured as "append-only" which ignores updates and only allows inserts.
Key Issues and Causes
-
Append-Only Streaming Limitations: If your silver dimension table is configured as an append-only streaming table, updates and deletes in the source table will not be applied. Only inserts are appended, and changing existing rows is ignored until a full refresh.
-
Streaming Join Semantics: Streaming dimension joins often require the source to be append-only. If the join does not match on your auto-incrementing ID, DLT may skip processing updates, as in your scenario with URLs linked to those IDs.
-
Update Logic and Constraints: If the pipeline uses a logic such as MERGE but the matching condition only triggers inserts or ignores updates for certain keys, detected changes will not propagate unless a full refresh occurs.
-
Auto-Incrementing ID Issue: Using an auto-incrementing column as a join or merge key can cause issues because updates based on business keys do not occur if the technical key (ID) does not change, making the pipeline look like it processed changes without physical data updates.
Recommendations
-
Enable Change Data Feed (CDF): For real-time reflection of updates, enable Delta Change Data Feed on your source tables. This lets DLT see update and delete events, not just inserts—so downstream silver/gold tables receive true change events, reducing false update detection.
-
Review Table Definition and Pipeline Settings:
-
Ensure your Silver table is not in "append-only" mode; switch to a mode that supports updates (for example, using MERGE INTO or APPLY CHANGES).
-
Verify auto-incrementing IDs are not used as the primary merge/join condition for updates. Instead, use business keys for identifying rows to update.
-
Incremental Loads With Surrogate Keys: Consider maintaining both business key (for updates) and surrogate key (for URLs) in the table, but prevent DLT logic from relying on surrogate key for determining updates.
-
Diagnostics:
-
Review the pipeline logs and check if updates are skipped due to key mismatch, schema enforcement, or mode restrictions.
-
Validate that streaming updates flow as expected with a small test set before scaling to the full dataset.
Alternatives
-
Manual Use of MERGE: If DLT automation doesn't suffice, you may need to explicitly write a MERGE statement in your pipeline logic (not relying solely on DLT-managed APPLY CHANGES), carefully handling surrogate keys.
-
Avoid Full Refreshes: If you must avoid a full refresh due to ID constraints and legacy URLs, carefully adjust the incremental update strategy to only MERGE based on business keys and update dimension attributes rather than inserting new IDs.
These changes should help your DLT pipeline pick up source table changes without needing a disruptive full refresh, preserving your URL scheme based on existing auto-incrementing IDs.