I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)
I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbutils.fs.mv while the files that get processed are archived off to a different location
One scenario i've encountered is this:
external upload process is uploading somefile.parquet to adlsv2
- the workflow job starts
- spark.read.parquet() fails with - Caused by: java.io.IOException: Could not read footer for file:
- dbutils.fs.mv moves the file (boo)
- the external process fails because mv has deleted the target while the upload is in progress
I'd assumed that mv would fail because there would be a exclusive lock on the file while its being uploaded but that's not the case (??)
Any suggestions on how to handle this?
Is there a way for me to check if a file is locked/being written to?
What's the error/exception to catch for this error? i've spent an hour(s) trying to figure it out but the generic python ones dont cover it and I get a nameerror for the specific spark ones I try