cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

marcuskw
Contributor II

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for 

"whenMatchedUpdate" and "whenNotMatchedInsert" logic.
 
When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate the specific partition in the delta table.
 
So when merging a dataframe with that logic we meet a ConcurrentAppendException when running in parallel even though the table is set up with "partition" as a constraint.
 
    (
        deltaTable.alias('t')
            .merge(df.alias('c'), f" t.partition= '{partition}' AND t.id= c.id")
            .whenMatchedUpdate( set =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedInsert( values =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedBySourceUpdate( condition = f"t.partition= '{partition}'",
            set  ={
                    ,"t.flag": F.lit(True)
                }
            )
            .execute()
    )
 
Have I misunderstood the merge syntax, possibly whenNotMatchedBySourceUpdate scans the whole table and ignores the condition?
1 ACCEPTED SOLUTION

Accepted Solutions

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns canโ€™t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

View solution in original post

1 REPLY 1

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns canโ€™t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group