cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Where to find documentation about : spark.databricks.driver.strace.enabled

Oliver_Floyd
Contributor

Hello ,

For a support request, Microsoft support ask me to add

spark.databricks.driver.strace.enabled true

to my cluster configuration.

MS was not able to send me a link to the documentation and I did not find it on the databricks website.

Can someone help me to find documentation about this parameter ?

4 REPLIES 4

Debayan
Esteemed Contributor III

Hi @oliv vierโ€‹ , could you please give us a little context on what Microsoft has mentioned to configure the spark configuration?

Oliver_Floyd
Contributor

Yes no problem.

I have a python program, called "post ingestion", that run on a databricks job cluster during the night and consist of :

  • inserting data to a deltalake table
  • executing an optimize command on that table
  • executing a vacuum command on that table
  • And then I use dbutils command to copy the folder containing data of this delta table to another folder (I dispatch data for a lab and a qal databricks workspace)

Sometimes the copy failed with a 404 error :

java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET, https://icmfcprddls001.dfs.core.windows.net/prd/curated/common/AS400/AS400.BC300.FT3RCPV/_delta_log/..., PathNotFound, 'The specified path does not exist. RequestId:e31cc75e-b01f-0042-1858-c23924000000 Time:2022-09-07T01:26:06.8939739Z

When the error occurs, no other program is using the delta table

At the morning when I re-run the copy, everything run fine.

They ask me to add this parameter to get more detail about sparks operations. But I want to know exactly what this parameter do, and MS was not able to give me more informations

Hubert-Dudek
Esteemed Contributor III

I would not use dbutils in production as they use just only one core of driver. Instead, why not execute Azure Data Factory by triggering it and it offers gigantic throughput, and it will be easier to analyze copy results.

One core is not a problem for me and I do not want to stack services.

Before changing all our architecture, I will try to find a solution

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group