cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.read.parquet() - how to check for file lock before reading? (azure)

jakubk
Contributor

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)

I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbutils.fs.mv while the files that get processed are archived off to a different location

One scenario i've encountered is this:

external upload process is uploading somefile.parquet to adlsv2

- the workflow job starts

- spark.read.parquet() fails with - Caused by: java.io.IOException: Could not read footer for file:

- dbutils.fs.mv moves the file (boo)

- the external process fails because mv has deleted the target while the upload is in progress

I'd assumed that mv would fail because there would be a exclusive lock on the file while its being uploaded but that's not the case (??)

Any suggestions on how to handle this?

Is there a way for me to check if a file is locked/being written to?

What's the error/exception to catch for this error? i've spent an hour(s) trying to figure it out but the generic python ones dont cover it and I get a nameerror for the specific spark ones I try

2 REPLIES 2

-werners-
Esteemed Contributor III

do you have any idea on how the file would be locked? Because that should not be the case (unless the file is actually being written, so not finished yet).

jakubk
Contributor

That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external tool

I can see via the upload tool that the file upload is 'in progress'

I can also see the 0 byte destination file in the adlsv2 container (while its being uploaded)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.