Merge into and data loss
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2022 08:55 PM
I have a delta table with 20 M rows, Ther table is being updated dozens of times per day. The merge into is used, and the merge works fine for 1 year. But recently I begin notice some of data is deleted from merge into without delete specified. Merge into onmly do updates.
I have published a test notebook, but am unable to reproduce the issue.
I can reproduce the issue in my production database.
Have anyone encountered a similar issue before?
- Labels:
-
Dataloss
-
Delta table
-
Merge Into
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2022 07:44 PM
Hi @lizou If we do not have any delete condition specified in the merge command's when matched clause, it should not be deleting the data. Was there any vacuum run on the table after which you started facing the issue?
We won't be able to access the notebook without the support ticket. If you don't find any delete condition in the merge command, and no vacuum was run on the table, you may create a support case for further analysis. You can raise support ticket here: https://help.databricks.com/s/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2022 03:38 PM
I can't reproduce the issue anymore. for now, I am going to limit the number of merge into commands as intermediate data transformation does not need versioning history. I am going to try to use combined views for each step, and do a one-time merge into the final table at the end.
![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)