Databricks

Oliver_Floyd · ‎10-17-2022

Hello ,

For a support request, Microsoft support ask me to add

spark.databricks.driver.strace.enabled true

to my cluster configuration.

MS was not able to send me a link to the documentation and I did not find it on the databricks website.

Can someone help me to find documentation about this parameter ?

Debayan · ‎10-17-2022

Hi @oliv vier , could you please give us a little context on what Microsoft has mentioned to configure the spark configuration?

Oliver_Floyd · ‎10-18-2022

Yes no problem.

I have a python program, called "post ingestion", that run on a databricks job cluster during the night and consist of :

inserting data to a deltalake table
executing an optimize command on that table
executing a vacuum command on that table
And then I use dbutils command to copy the folder containing data of this delta table to another folder (I dispatch data for a lab and a qal databricks workspace)

Sometimes the copy failed with a 404 error :

java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET, https://icmfcprddls001.dfs.core.windows.net/prd/curated/common/AS400/AS400.BC300.FT3RCPV/_delta_log/..., PathNotFound, 'The specified path does not exist. RequestId:e31cc75e-b01f-0042-1858-c23924000000 Time:2022-09-07T01:26:06.8939739Z

When the error occurs, no other program is using the delta table

At the morning when I re-run the copy, everything run fine.

They ask me to add this parameter to get more detail about sparks operations. But I want to know exactly what this parameter do, and MS was not able to give me more informations

Hubert-Dudek · ‎11-03-2022

I would not use dbutils in production as they use just only one core of driver. Instead, why not execute Azure Data Factory by triggering it and it offers gigantic throughput, and it will be easier to analyze copy results.