cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Where to find documentation about : spark.databricks.driver.strace.enabled

Oliver_Floyd
Contributor

Hello ,

For a support request, Microsoft support ask me to add

spark.databricks.driver.strace.enabled true

to my cluster configuration.

MS was not able to send me a link to the documentation and I did not find it on the databricks website.

Can someone help me to find documentation about this parameter ?

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @oliv vier​ , could you please give us a little context on what Microsoft has mentioned to configure the spark configuration?

Oliver_Floyd
Contributor

Yes no problem.

I have a python program, called "post ingestion", that run on a databricks job cluster during the night and consist of :

  • inserting data to a deltalake table
  • executing an optimize command on that table
  • executing a vacuum command on that table
  • And then I use dbutils command to copy the folder containing data of this delta table to another folder (I dispatch data for a lab and a qal databricks workspace)

Sometimes the copy failed with a 404 error :

java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET, https://icmfcprddls001.dfs.core.windows.net/prd/curated/common/AS400/AS400.BC300.FT3RCPV/_delta_log/..., PathNotFound, 'The specified path does not exist. RequestId:e31cc75e-b01f-0042-1858-c23924000000 Time:2022-09-07T01:26:06.8939739Z

When the error occurs, no other program is using the delta table

At the morning when I re-run the copy, everything run fine.

They ask me to add this parameter to get more detail about sparks operations. But I want to know exactly what this parameter do, and MS was not able to give me more informations

Hubert-Dudek
Esteemed Contributor III

I would not use dbutils in production as they use just only one core of driver. Instead, why not execute Azure Data Factory by triggering it and it offers gigantic throughput, and it will be easier to analyze copy results.

One core is not a problem for me and I do not want to stack services.

Before changing all our architecture, I will try to find a solution

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.