cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Free Trial Help
Engage in discussions about the Databricks Free Trial within the Databricks Community. Share insights, tips, and best practices for getting started, troubleshooting issues, and maximizing the value of your trial experience to explore Databricks' capabilities effectively.
cancel
Showing results for 
Search instead for 
Did you mean: 

Concurrency in replaceWhere()

arnavsood
New Contributor

Hi Databricks team

 

I had a quick question and would appreciate your guidance.

 

Let’s say I have a Delta table (not partitioned), and I'm using the overwrite mode along with the replaceWhere clause to overwrite data for city = 'LA' and city = 'NY' in two separate jobs running in parallel. These are writing to same target delta table.

 

Since the rows are isolated but the table is not partitioned, my question is:

 

> Will Delta Lake use deletion vectors or any form of row-level concurrency control to safely handle these parallel overwrite operations specific to overwrite with replaceWhere() clause?

 

Rows are isolated and dont overlap but underlying files might (For NY and LA)

 

 

Or is there still a risk of file-level conflicts even though the replaceWhere clauses target different cities?

 

Thanks in advance for your help.

 

Best regards,

Arnav

1 REPLY 1

SP_6721
Contributor III

Hi @arnavsood ,

As per my understanding, even if parallel jobs are using overwrite with replaceWhere to update different rows, file-level conflicts can still occur. That’s because Delta Lake doesn’t provide row-level concurrency by default.

Row-level concurrency becomes available when deletion vectors are enabled on a non-partitioned table (or when using liquid clustering), and you're running on Databricks Runtime 14.2 or above.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now