Databricks Community

marcuskw · ‎09-08-2023

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for

"whenMatchedUpdate" and "whenNotMatchedInsert" logic.

When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate the specific partition in the delta table.

So when merging a dataframe with that logic we meet a ConcurrentAppendException when running in parallel even though the table is set up with "partition" as a constraint.

(

deltaTable.alias('t')

.merge(df.alias('c'), f" t.partition= '{partition}' AND t.id= c.id")

.whenMatchedUpdate( set =

{

"t.id": "c.id"

,"t.name": "c.name"

,"t.partition": "c.partition"

,"t.flag": "c.flag"

}

)

.whenNotMatchedInsert( values =

{

"t.id": "c.id"

,"t.name": "c.name"

,"t.partition": "c.partition"

,"t.flag": "c.flag"

}

)

.whenNotMatchedBySourceUpdate( condition = f"t.partition= '{partition}'",

set ={

,"t.flag": F.lit(True)

}

)

.execute()

)

Have I misunderstood the merge syntax, possibly whenNotMatchedBySourceUpdate scans the whole table and ignores the condition?

marcuskw · ‎09-11-2023

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns can’t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

View solution in original post

marcuskw · ‎09-11-2023

https://docs.databricks.com/en/delta/merge.html

"By definition, whenNotMatchedBySource clauses do not have a source row to pull column values from, and so source columns can’t be referenced."

I found out that this was not functioning correctly in runtime 12.2 but works in 13.3.
So it must have been a fix in the Delta Lake version

Databricks Community

whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!