cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

marcuskw
Contributor II

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for 

"whenMatchedUpdate" and "whenNotMatchedInsert" logic.
 
When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate the specific partition in the delta table.
 
So when merging a dataframe with that logic we meet a ConcurrentAppendException when running in parallel even though the table is set up with "partition" as a constraint.
 
    (
        deltaTable.alias('t')
            .merge(df.alias('c'), f" t.partition= '{partition}' AND t.id= c.id")
            .whenMatchedUpdate( set =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedInsert( values =
                {
                    "t.id": "c.id"
                    ,"t.name": "c.name"
                    ,"t.partition": "c.partition"
                    ,"t.flag": "c.flag"
                }
            )
            .whenNotMatchedBySourceUpdate( condition = f"t.partition= '{partition}'",
            set  ={
                    ,"t.flag": F.lit(True)
                }
            )
            .execute()
    )
 
Have I misunderstood the merge syntax, possibly whenNotMatchedBySourceUpdate scans the whole table and ignores the condition?
1 ACCEPTED SOLUTION

Accepted Solutions

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns can’t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

View solution in original post

1 REPLY 1

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns can’t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now