cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue while handling Deletes and Inserts in Structured Streaming

bricks_2026
New Contributor

Hello

We have a framework which reads the CDF logs from the source table and then merges to the target table. 

The logic is implemented in such a way that( if there are multiple commit_versions in the source table), a window function is applied to identify the last operation .

The last operation is then used and replicated to the target.

This works as long as Inserts and Updates Operations are done.

In a particular scenario, in order to do an update of a record, the row is first deleted(COMMIT_TYPE=Delete) and then inserted(COMMIT_TYPE=Insert) with changes in the data. Our framework logic is not able to handle this scenario , as it selects only the latest commit_type= INSERT and does not consider the delete commit_type. As this record is already available in the target, therefore the logic does not do anything and completely ignores the record. Due to this reason,  we are missing the Updates in the Target Table. Can you suggest ways to fix this issue ? 

Thanks in advance for your support. 

Best Regards

1 REPLY 1

Sumit_7
Honored Contributor

The issue arises because your framework collapses multiple CDF events using a window and only retains the latest commit_version, which breaks update semantics.

In Delta CDF, an update is represented as a delete + insert pair, not a single event. By selecting only the latest insert, you lose the preceding delete, causing the MERGE to skip updates when the record already exists. The correct approach is to process CDF events without collapsing them prematurely. Use _change_type explicitly in your MERGE logic to handle delete, insert, and update_postimage correctly. Alternatively, leverage update_postimage records instead of inferring updates manually. If windowing is unavoidable, detect delete → insert patterns and treat them as updates. A more robust solution is to use Databricks’ APPLY CHANGES INTO, which natively handles ordering and update semantics. Ensure your pipeline preserves event order via _commit_version. Avoid relying solely on “latest state = truth” in event-driven systems. This adjustment will correctly propagate updates to the target table.

Thanks & Regards,