Oliver_Floyd
Contributor

Yes no problem.

I have a python program, called "post ingestion", that run on a databricks job cluster during the night and consist of :

  • inserting data to a deltalake table
  • executing an optimize command on that table
  • executing a vacuum command on that table
  • And then I use dbutils command to copy the folder containing data of this delta table to another folder (I dispatch data for a lab and a qal databricks workspace)

Sometimes the copy failed with a 404 error :

java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET, https://icmfcprddls001.dfs.core.windows.net/prd/curated/common/AS400/AS400.BC300.FT3RCPV/_delta_log/..., PathNotFound, 'The specified path does not exist. RequestId:e31cc75e-b01f-0042-1858-c23924000000 Time:2022-09-07T01:26:06.8939739Z

When the error occurs, no other program is using the delta table

At the morning when I re-run the copy, everything run fine.

They ask me to add this parameter to get more detail about sparks operations. But I want to know exactly what this parameter do, and MS was not able to give me more informations