cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
joe_widen
Databricks Employee
Databricks Employee

In DBR 16.1+, we’ve improved functionality of MERGE operations where multiple rows of the source dataset match the same row of the target Delta table, but only one row matches the WHEN MATCHED condition.  In the past, these operations would fail with DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE.

Prior DBR’s MERGE would only use the MERGE conditions to detect multiple rows from the source dataset trying to update the target dataset leading to ambiguity.  This improved functionality allows for the WHEN MATCHED condition to help resolve the ambiguity.

As an example, let's take a look at a MERGE that will now succeed in DBR 16.1+.

MERGE INTO t USING s
ON t.id = s.id 
WHEN MATCHED AND s.value = 'b' 
THEN UPDATE SET *
WHEN NOT MATCHED 
THEN INSERT *

Target Table

ID

Value

1

a

Source Table

ID

Value

1

b

1

c

 

In this example, the source has two matching rows that could be used to update the target table.  The WHEN MATCHED condition reduces the matching rows from two matches to one match.  When running this with an older DBR, it would fail with DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE, even though the WHEN MATCHED condition reduced the matches to one.  The improved functionality in DBR 16.1+ will allow this to succeed

2 Comments
BamBam
New Contributor III

What happens in DBRS 16.1 if the WHEN condition is this:

WHEN MATCHED AND s.value IS DISTINCT FROM t.value

Basically saying when it matches on the PK but then check if the other columns are different before taking an action like UPDATE.

Mantsama4
Contributor III

This is a great solution! The improved duplicate match handling in Delta MERGE operations in DBR 16.1+ is a significant enhancement, as it resolves ambiguity and ensures smoother operations by leveraging the WHEN MATCHED condition. This update not only simplifies workflows but also reduces errors, making data management more efficient. Well done!

I have a humble question: Are there any recommended cost optimization strategies that can be applied alongside this improved functionality, especially for large-scale MERGE operations? For instance, how can we ensure efficient resource utilization while maintaining performance during such operations? Thank you!