Databricks Community

jakubk · ‎09-07-2022

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)

I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbutils.fs.mv while the files that get processed are archived off to a different location

One scenario i've encountered is this:

external upload process is uploading somefile.parquet to adlsv2

- the workflow job starts

- spark.read.parquet() fails with - Caused by: java.io.IOException: Could not read footer for file:

- dbutils.fs.mv moves the file (boo)

- the external process fails because mv has deleted the target while the upload is in progress

I'd assumed that mv would fail because there would be a exclusive lock on the file while its being uploaded but that's not the case (??)

Any suggestions on how to handle this?

Is there a way for me to check if a file is locked/being written to?

What's the error/exception to catch for this error? i've spent an hour(s) trying to figure it out but the generic python ones dont cover it and I get a nameerror for the specific spark ones I try

-werners- · ‎09-08-2022

do you have any idea on how the file would be locked? Because that should not be the case (unless the file is actually being written, so not finished yet).