cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the behavior when merge key is not unique

Brad
Contributor

Hi, 

When using the MERGE statement, if merge key is not unique on both source and target, it will throw error. If merge key is unique in source but not unique in target, WHEN MATCHED THEN DELETE/UPDATE should work or not? For example merge key is id. Src has 1 row id = 1, target has multiple rows with id = 1. WHEN MATCHED will update or delete all rows in target for id = 1 or pick up 1 randomly?

Thanks

Brad

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @BradWhen working with the MERGE statement in SQL, it’s essential to understand how it handles matching records between the source and target tables.

Let’s break down your scenario:

  1. Merge Key Not Unique on Both Source and Target:

    • If the merge key (in your case, the id) is not unique in both the source and target tables, the MERGE statement will indeed throw an error. It requires a unique match to perform the merge operation effectively.
  2. Merge Key Unique in Source but Not Unique in Target:

    • When the merge key is unique in the source but not unique in the target, the behavior depends on how you specify the WHEN MATCHED clause:
      • WHEN MATCHED THEN DELETE: This clause will delete all matching rows in the target table for the given merge key. In your example, if the source has a single row with id = 1, and the target has multiple rows with the same id = 1, executing WHEN MATCHED THEN DELETE will delete all those target rows with id = 1.
      • WHEN MATCHED THEN UPDATE: Similarly, if you use WHEN MATCHED THEN UPDATE, it will update all matching rows in the target for the specified merge key. Again, in your case, all target rows with id = 1 would be updated.
  3. Random Selection:

    • The MERGE statement itself does not inherently pick rows randomly. Instead, it processes all matching rows based on the specified action (delete or update).
    • If you want to control which specific row gets updated or deleted when there are multiple matches, you’ll need to add additional logic to your query. For example, you could use an additional column (such as a timestamp or an ordering column) to determine which row to update or delete.

In summary, the MERGE statement provides powerful capabilities for handling data synchronization between source and target tables. However, it’s crucial to understand the behavior of the WHEN MATCHED clause and consider additional criteria if you need more fine-grained control over the update or delete process. 🚀🔍

For more detailed examples and syntax, you can refer to the official Microsoft documentation on MERGE1.

 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @BradWhen working with the MERGE statement in SQL, it’s essential to understand how it handles matching records between the source and target tables.

Let’s break down your scenario:

  1. Merge Key Not Unique on Both Source and Target:

    • If the merge key (in your case, the id) is not unique in both the source and target tables, the MERGE statement will indeed throw an error. It requires a unique match to perform the merge operation effectively.
  2. Merge Key Unique in Source but Not Unique in Target:

    • When the merge key is unique in the source but not unique in the target, the behavior depends on how you specify the WHEN MATCHED clause:
      • WHEN MATCHED THEN DELETE: This clause will delete all matching rows in the target table for the given merge key. In your example, if the source has a single row with id = 1, and the target has multiple rows with the same id = 1, executing WHEN MATCHED THEN DELETE will delete all those target rows with id = 1.
      • WHEN MATCHED THEN UPDATE: Similarly, if you use WHEN MATCHED THEN UPDATE, it will update all matching rows in the target for the specified merge key. Again, in your case, all target rows with id = 1 would be updated.
  3. Random Selection:

    • The MERGE statement itself does not inherently pick rows randomly. Instead, it processes all matching rows based on the specified action (delete or update).
    • If you want to control which specific row gets updated or deleted when there are multiple matches, you’ll need to add additional logic to your query. For example, you could use an additional column (such as a timestamp or an ordering column) to determine which row to update or delete.

In summary, the MERGE statement provides powerful capabilities for handling data synchronization between source and target tables. However, it’s crucial to understand the behavior of the WHEN MATCHED clause and consider additional criteria if you need more fine-grained control over the update or delete process. 🚀🔍

For more detailed examples and syntax, you can refer to the official Microsoft documentation on MERGE1.

 

Brad
Contributor

Cool, this is what I tested out. Great to get confirmed. Thanks. BTW, https://medium.com/@ritik20023/delta-lake-upserting-without-primary-key-f4a931576b0 has a workaround which can fix the merge with duplicate merge key on both source and target.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.