cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

Akshith_Rajesh
New Contributor III

I have a requirement , I am running 2 Notebooks parallelly I want to overwrite the file parallelly .

If 2 Notebooks Try to overwrite the file at the same time , will I lose the data because of overwriting the file at the same time .

I want to overwrite the file by reading and appending some new rows and then overwrite.

How can we handle this situation

4 REPLIES 4

daniel_sahal
Esteemed Contributor

@Rajesh Akshith​ 

Delta is ACID complaint, so writing to the same file parallelly may cause a failure.

https://docs.databricks.com/lakehouse/acid.html#how-does-databricks-implement-consistency

About how you can handle this situation, it depends on the use case.

I would suggest to partition the data, so the parallel processes would r/w to the different files.

Tayyab_Vohra
Contributor

Hi @Rajesh Akshith​ ,

Don't you think the better idea would be to run notebooks simultaneously and write in different files, while writing the data you can add datetime column, and after writing the column you can merge them together into one file.

This whole process can be achieved within the same notebook or different depending on your use.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi,

ADLS gen1 was limited and used to lock the file, however, ADLS gen2 supports concurrent run to a specific size.

I got two online articles which may help:

https://stackoverflow.com/questions/58301154/datalake-locks-on-read-and-write-for-the-same-file

https://social.msdn.microsoft.com/Forums/en-US/8d354c9b-588d-44de-83a1-bac28acc2085/adls-gen2-concur...

Please let us know if this helps.

Also please tag @Debayan​ with your next response which will notify me, Thank you!

Anonymous
Not applicable

Hi @Rajesh Akshith​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.