cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Merge into and data loss

lizou
Contributor II

I have a delta table with 20 M rows, Ther table is being updated dozens of times per day. The merge into is used, and the merge works fine for 1 year. But recently I begin notice some of data is deleted from merge into without delete specified. Merge into onmly do updates.

I have published a test notebook, but am unable to reproduce the issue.

I can reproduce the issue in my production database.

Have anyone encountered a similar issue before?

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6324...

2 REPLIES 2

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @lizou​ If we do not have any delete condition specified in the merge command's when matched clause, it should not be deleting the data. Was there any vacuum run on the table after which you started facing the issue?

We won't be able to access the notebook without the support ticket. If you don't find any delete condition in the merge command, and no vacuum was run on the table, you may create a support case for further analysis. You can raise support ticket here: https://help.databricks.com/s/

lizou
Contributor II

I can't reproduce the issue anymore. for now, I am going to limit the number of merge into commands as intermediate data transformation does not need versioning history. I am going to try to use combined views for each step, and do a one-time merge into the final table at the end.