cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Free Edition Help
Engage in discussions about the Databricks Free Edition within the Databricks Community. Share insights, tips, and best practices for getting started, troubleshooting issues, and maximizing the value of your trial experience to explore Databricks' capabilities effectively.
cancel
Showing results for 
Search instead for 
Did you mean: 

Merge Conflicts During Concurrent Delta Table Updates

SantiNath_Dey
Contributor

I have a requirement where I need to run multiple concurrent runs on a particular table, however some merges fail and some merges execute. how can i handle this

I have a requirement where I need to run multiple concurrent runs on a particular table, however some merges fail and some merges execute. how can i handle this. I have tried below option -1 . Partition by primary key : Do merge update on parttion key still getting merge conflict issue.2. retry option : using this option merge conflict issue.


I have a requirement where I need to run multiple concurrent operations on a particular table. However, some merge operations fail while others succeed. How can I handle this?
I have tried the following options:
Partitioning by primary key – I performed merge updates on the partition key, but still encountered merge conflict issues.
Retry option – I used a retry mechanism, but the merge conflicts still persist.

There is option to over-come this issue?

Thanks in advance.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

 

Here are your options:

  1. Best fix: stop using partitioned target tables for concurrent MERGE Partitioned Delta tables do not support row-level concurrency. If you can, move the target to unpartitioned + Liquid Clustering + deletion vectors. For MERGE, use DBR 14.3 LTS+ (or 14.2 with Photon). This is the cleanest way to reduce concurrent merge conflicts.

  2. If you must keep partitions, make the MERGE predicate fully explicit Don’t just join on PK. Include the target partition filters in the MERGE condition so each job reads only its own partition slice; otherwise separate jobs can still scan overlapping data and conflict.

  3. If jobs can touch the same rows, serialize writes Even with row-level concurrency, conflicts can still happen when operations read/modify the same rows, and with high concurrency some conflicts still occur. In that case, use a staging table + one downstream MERGE job (or queue by business key) instead of many direct concurrent MERGEs.

Recommendation: Option 1 if you can change the table design. If not, use Option 2, and if the same keys can arrive together, add Option 3.

View solution in original post

SantiNath_Dey
Contributor

Thank for your response

View solution in original post

3 REPLIES 3

SantiNath_Dey
Contributor

I have a requirement where I need to run multiple concurrent operations on a particular table. However, some merge operations fail while others succeed. How can I handle this?
I have tried the following options:
Partitioning by primary key – I performed merge updates on the partition key, but still encountered merge conflict issues.
Retry option – I used a retry mechanism, but the merge conflicts still persist.

There is option to over-come this issue?

Thanks in advance.

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

 

Here are your options:

  1. Best fix: stop using partitioned target tables for concurrent MERGE Partitioned Delta tables do not support row-level concurrency. If you can, move the target to unpartitioned + Liquid Clustering + deletion vectors. For MERGE, use DBR 14.3 LTS+ (or 14.2 with Photon). This is the cleanest way to reduce concurrent merge conflicts.

  2. If you must keep partitions, make the MERGE predicate fully explicit Don’t just join on PK. Include the target partition filters in the MERGE condition so each job reads only its own partition slice; otherwise separate jobs can still scan overlapping data and conflict.

  3. If jobs can touch the same rows, serialize writes Even with row-level concurrency, conflicts can still happen when operations read/modify the same rows, and with high concurrency some conflicts still occur. In that case, use a staging table + one downstream MERGE job (or queue by business key) instead of many direct concurrent MERGEs.

Recommendation: Option 1 if you can change the table design. If not, use Option 2, and if the same keys can arrive together, add Option 3.

SantiNath_Dey
Contributor

Thank for your response