- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2026 11:30 PM
The issue arises because your framework collapses multiple CDF events using a window and only retains the latest commit_version, which breaks update semantics.
In Delta CDF, an update is represented as a delete + insert pair, not a single event. By selecting only the latest insert, you lose the preceding delete, causing the MERGE to skip updates when the record already exists. The correct approach is to process CDF events without collapsing them prematurely. Use _change_type explicitly in your MERGE logic to handle delete, insert, and update_postimage correctly. Alternatively, leverage update_postimage records instead of inferring updates manually. If windowing is unavoidable, detect delete → insert patterns and treat them as updates. A more robust solution is to use Databricks’ APPLY CHANGES INTO, which natively handles ordering and update semantics. Ensure your pipeline preserves event order via _commit_version. Avoid relying solely on “latest state = truth” in event-driven systems. This adjustment will correctly propagate updates to the target table.
Thanks & Regards,