cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ConcurrentAppendException After Delta Table was enable Liquid Clustering and Row level concurrency

NhanNguyen
Contributor III

Everytime I run parallel job it always failed with this error: 

ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.

I did a lot of reseaches also create liquid clustering table and row level concurrency but it still failed.

Note: I can pass with retry method, but my boss doesn't like that approach. Does anyone can help me on this case?

Thanks you very much!

Please see detailt my table configuration in attachment picture.

8 REPLIES 8

NhanNguyen
Contributor III

Note: I tried both DBR 13.3.x and 14.3.x but still failed with same error

cgrant
Databricks Employee
Databricks Employee

See this chart for a description of how row-level concurrency works. With row-level concurrency, concurrent merge operations still cannot safely modify the same row. Without row-level concurrency, concurrenty merge operations cannot safely modify the same partition. 

Here are two strategies for handling this (there may be more):

  • With each merge statement, ensure that you have a disjoint predicate to make sure that only one merge touches a swath of data at a given point in time. For example, we have a table that contains data for many global regions (AMER, EUROPE, ASIA) and we want to concurrently merge into this table. With merge 1, provide region = 'AMER'  as an extra predicate. Merge 2, include region = 'EUROPE', etc.
  • Union your operations together and avoid the concurrency issue altogether

Please be aware that target tables with identity columns enabled do not support concurrent operations, regardless of disjoint predicates.

Hi @cgrant,

Thanks for your checking, Could you please help me review on this?I also did the disjoint predicate here is my condition for merge, 

[
'target.ClientCode = source.ClientCode',
'target.ReportingPeriodCadenceID <=> source.ReportingPeriodCadenceID',
'target.ReportingPeriodID <=> source.ReportingPeriodID',
'target.ClientCode = source.ClientCode',
'target.AccountReferenceID = source.AccountReferenceID',
'target.ReportingPeriodID = 867',
"target.ClientCode = 'AMEX'"
]
I specific two columns that is clustered by before ('target.ReportingPeriodID = 867'"target.ClientCode = 'AMEX'") but still failed with the same issue.
NhanNguyen_0-1732679527493.png

 

 

cgrant
Databricks Employee
Databricks Employee

What do the other merge commands look like?

Hi @cgrant,

Here is my full merge and condtion:

NhanNguyen_0-1732778140343.png
Thanks for your reviewing

NhanNguyen
Contributor III

Hi @cgrant,
Do you have any update on this? Thanks and very appreciate for your help!

cgrant
Databricks Employee
Databricks Employee

Thanks for sharing. In the original screenshots I've noticed that you've set delta.isolationLevel to Serializable, which is the strongest (and most strict) level. Please try WriteSerializable, which is the default level.

Hi @cgrant,

Yes, I just tried to change delta.isolationLevel to WriteSerilizable but still failed, please view my screentshoot

NhanNguyen_0-1733301440819.png

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group