cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

marcuskw
Contributor

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for 

"whenMatchedUpdate" and "whenNotMatchedInsert" logic.
 
When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate the specific partition in the delta table.
 
So when merging a dataframe with that logic we meet a ConcurrentAppendException when running in parallel even though the table is set up with "partition" as a constraint.
 
    (
        deltaTable.alias('t')
            .merge(df.alias('c'), f" t.partition= '{partition}' AND t.id= c.id")
            .whenMatchedUpdate( set =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedInsert( values =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedBySourceUpdate( condition = f"t.partition= '{partition}'",
            set  ={
                    ,"t.flag": F.lit(True)
                }
            )
            .execute()
    )
 
Have I misunderstood the merge syntax, possibly whenNotMatchedBySourceUpdate scans the whole table and ignores the condition?
1 ACCEPTED SOLUTION

Accepted Solutions

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns canโ€™t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @marcuskwBased on the provided information and the given code snippet, it seems that the condition in the whenNotMatchedBySourceUpdate The clause does not isolate the specific partition in the Delta table.

This can lead to a ConcurrentAppendException when running the merge operation in parallel, even though the table is set up with partitioning as a constraint.

To avoid this issue, you must make the separation explicit in the operation condition. In the provided code snippet, you can modify the state in the whenNotMatchedBySourceUpdate clause to include the partition column, similar to how it is done in the whenMatchedUpdate and whenNotMatchedInsert clauses.

Here's an updated version of the code snippet:

python
(
 deltaTable.alias('t')
   .merge(df.alias('c'), f" t.partition= '{partition}' AND t.id= c.id")
   .whenMatchedUpdate(set =
     {
       "t.id": "c.id",
       "t.name": "c.name",
       "t.partition": "c.partition",
       "t.flag": "c.flag"
     }
   )
   .whenNotMatchedInsert(values =
     {
       "t.id": "c.id",
       "t.name": "c.name",
       "t.partition": "c.partition",
       "t.flag": "c.flag"
     }
   )
   .whenNotMatchedBySourceUpdate(condition = f"t.partition= '{partition}' AND t.id= c.id",
     set  ={
       "t.flag": F.lit(True)
     }
   )
   .execute()
)

By including the t.partition= '{partition}' condition in the whenNotMatchedBySourceUpdate clause, you ensure that only the specific partition is scanned and edited, reducing the chances of a ConcurrentAppendException during parallel execution.

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns canโ€™t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.