merge into deletes from SOURCE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 01:42 PM
I am using the following query to make an upsert:
MERGE INTO my_target_table AS target
USING (SELECT MAX(__my_timestamp) AS checkpoint FROM my_source_table) AS source
ON target.name = 'some_name'
AND target.address = 'some_address'
WHEN MATCHED AND source.checkpoint IS NOT NULL THEN
UPDATE SET checkpoint = source.checkpoint
WHEN NOT MATCHED THEN
INSERT (name, address, checkpoint)
VALUES ('some_name', 'some_address', source.checkpoint)
Whenever it does 'insert', it also deletes from *my_source_table*. Any explanation why it deletes from *my_source_table* and can I avoid it, so the logic will stay the same, without anything being deleted from the source
- Labels:
-
Source
-
Source Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 02:36 PM
Can you provide the table history of your source table? Your logic appears correct. The source history should tell us if a delete is actually happening.
Or at least a before and after state of your table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 06:46 PM
I was using a view for my_source_table, once I changed that to be a table the issue stoped.
That unblocked me, but I think Databricks has a bug with using MERGE INTO from a VIEW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2023 09:44 AM
It should work the same whether it is a view of a table. But I am confused on how it was deleting data from a view.
In any case, happy this is resolved! And you are unblocked. If you believe there is a bug please provide more details on how we can replicate the issue and we can look into it.

