cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

need help with Concurrent append exception during delta merge

bikash84
New Contributor III

I have 2 datasets getting loaded into a common silver table. These are event driven and notebooks are triggered when a file is dropped into the storage account. When the files come in at the same time, one dataset fails with concurrent append exception. Can someone please help with this problem?

1 ACCEPTED SOLUTION

Accepted Solutions

yuvapraveen_k
New Contributor III

Databricks provides ACID guarantees since the inception of Delta format. In order to ensure the C - Consistency is addressed, it limits concurrent workflows to perform updates at the same time, like other ACID compliant SQL engines. The key difference is some SQL engines limits concurrent processes at the beginning of the merge by performing a lock. However, new age SQL engines like Delta looks at the signature of the delta log before it starts the commit and then assumes that there are no other processes performing the update in the concurrent fashion.  Now when one of the processes completed write the file and changed the delta log, the other process cannot write since the signature changed. This is the underlying workings, but what is the solution.

1. You can restart the processes that failed at the workflow level or using a pythonic exception handling.

2. Or, ensure that these processes are writing different partitions of the delta table that way it doesn't overlap the files that is created by the conflicting process.

Hope this helps...

View solution in original post

1 REPLY 1

yuvapraveen_k
New Contributor III

Databricks provides ACID guarantees since the inception of Delta format. In order to ensure the C - Consistency is addressed, it limits concurrent workflows to perform updates at the same time, like other ACID compliant SQL engines. The key difference is some SQL engines limits concurrent processes at the beginning of the merge by performing a lock. However, new age SQL engines like Delta looks at the signature of the delta log before it starts the commit and then assumes that there are no other processes performing the update in the concurrent fashion.  Now when one of the processes completed write the file and changed the delta log, the other process cannot write since the signature changed. This is the underlying workings, but what is the solution.

1. You can restart the processes that failed at the workflow level or using a pythonic exception handling.

2. Or, ensure that these processes are writing different partitions of the delta table that way it doesn't overlap the files that is created by the conflicting process.

Hope this helps...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group