- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2022 01:04 AM
Detection deletions does not work out of the box.
The merge statement will evaluate the incoming data against the existing data. It will not check the existing data against the incoming data.
To mark deletions, you will have to specifically update those records.
How do you do this?
This depends, if your source system detects deletions, they can probably be added to the incoming data and you can use the merge statement.
But in many cases deletes are not detected.
In that case you will have to compare the current state of the delta lake table against a complete current version of the data.
Getting the latter can be an issue. What I typically do (in case of an RDBMS source system) is to fetch the complete clustered index of the table (which will go pretty fast, even when there is a lot of data) and use that to compare + left_anti join.
There is also change data feed on delta lake, which you can use, but this only works if your source data is delta lake format.
Does this make any sense?