How to perform UPCERTS and DELETE missing records from Target table using MERGE command?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2022 08:14 AM
Hi have the following case. I want to perform upcert operation. But along with upcert, I want to delete the records which are missing in source table, but present in the target table. You can think it as a master data update.
- Source table contains a full set of master data. This is the latest incoming data.
- Target table contains the full set of master data. This is old data present in the current database.
- The source may contains new records, updates to some existing records, or some records might have removed compared to the target.
- So during MERGE operation, I want to update the matching records in target from source, Insert new incoming records from the source, and delete the records from target which were not present in the source. How to achieve this with databricks MERGE?
I see that the similar operation is possible in MYSQL Server, as shown below. But Databricks SQL doesn't support "BY SOURCE" option.
--Synchronize the target table with refreshed data from source table
MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE
ON (TARGET.ProductID = SOURCE.ProductID)
--When records are matched, update the records if there is any change
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName OR TARGET.Rate <> SOURCE.Rate
THEN UPDATE SET TARGET.ProductName = SOURCE.ProductName, TARGET.Rate = SOURCE.Rate
--When no records are matched, insert the incoming records from source table to target table
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, ProductName, Rate) VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--When there is a row that exists in target and same record does not exist in source then delete this record target
WHEN NOT MATCHED BY SOURCE
THEN DELETE
Thanks
Krishna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2022 12:10 AM
the WHEN NOT MATCHED clause can only contain an INSERT action.
So if you want to delete records, you should catch those records in the WHEN MATCHED clause.
If you have a record indicator (insert, update, delete) that is possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2022 05:16 AM
Any other option to DELETE with single MERGE command, where in we can perform, all insert, update, delete operations. Delete is mainly to delete any stale data from the target table. Record indicator, means a extra column in the database schema?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2022 05:21 AM
The record indicator is indeed a column which contains what type of data the record is (insert; delete, update). You might get this from a CDC system, or you can construct it yourself.
In case you have a cdc system, this is pretty straightforward as you can identify the deleted records and so you can use when matched.
Otherwise you will have to detect the deletes in another way. What I do if there is no CDC system available is to extract the PK of the concerning table and compare with that.
The performance will be quite good as you only select the PK (clustered index).
PS. I also never delete records on our data lake, I flag them as deleted or archived.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-22-2022 07:31 AM
Hey @Krishna Puthran
Hope you are well.
Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.
Cheers!

