cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error "insert concurrent to Delta Lake" when 2 streaming merge data to same table at the same time

khangnguyen164
New Contributor II

Hello everyone ,

We currently have 2 streaming (Bronze job) created on 2 tasks in the same job, running the same compute job and both merge data into the same table (Silver table). If I create it like above, sometimes I get an error related to "insert concurrent" because Delta Lake has blocked it.

But when I declare both streaming in the same task, the error does not occur. I have declared both streaming in the same task (file). brz1 = df1.readStream...start() brz2 = df2.readStream...start()

I hope someone can help me why when I create two streaming in the same task the "insert concurrent" error does not occur

 

3 REPLIES 3

khangnguyen164
New Contributor II

Sorry for sending the wrong information. I created 2 streams in the same task. brz1 = df1.writeStream...start() brz2 = df2.writeStream...start(). Please ignore my typo above as I am just giving the example the way I did it.

Hello @khangnguyen164!

When both streaming writes run in the same task, they are processed sequentially, likely sharing the same transaction context. This prevents concurrent insert conflicts, as micro-batches are scheduled one after another within the same Spark context.

However, when the streams run in separate tasks, they execute independently. This parallel execution can lead to conflicts like ConcurrentAppendException when both tasks try to write to the same Delta table simultaneously.

For more details, check out Concurrency control & Concurrent Writes to the same DELTA TABLE

khangnguyen164
New Contributor II

 Anyone else can help me this case ๐Ÿ˜ž