05-12-2023 01:42 PM
I am using the following query to make an upsert:
MERGE INTO my_target_table AS target
USING (SELECT MAX(__my_timestamp) AS checkpoint FROM my_source_table) AS source
ON target.name = 'some_name'
AND target.address = 'some_address'
WHEN MATCHED AND source.checkpoint IS NOT NULL THEN
UPDATE SET checkpoint = source.checkpoint
WHEN NOT MATCHED THEN
INSERT (name, address, checkpoint)
VALUES ('some_name', 'some_address', source.checkpoint)
Whenever it does 'insert', it also deletes from *my_source_table*. Any explanation why it deletes from *my_source_table* and can I avoid it, so the logic will stay the same, without anything being deleted from the source
05-12-2023 02:36 PM
Can you provide the table history of your source table? Your logic appears correct. The source history should tell us if a delete is actually happening.
Or at least a before and after state of your table
05-12-2023 06:46 PM
I was using a view for my_source_table, once I changed that to be a table the issue stoped.
That unblocked me, but I think Databricks has a bug with using MERGE INTO from a VIEW
06-09-2023 09:44 AM
It should work the same whether it is a view of a table. But I am confused on how it was deleting data from a view.
In any case, happy this is resolved! And you are unblocked. If you believe there is a bug please provide more details on how we can replicate the issue and we can look into it.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.