yesterday
Hello Community,
I am facing an intermittent issue while running a Databricks job. The job fails with the following error message:
Run failed with error message:
Could not reach driver of cluster <cluster-id>.
Here are some additional details:
Job Setup: This job runs a standard ETL notebook
Behavior:
Questions for the community:
Any guidance or troubleshooting tips would be highly appreciated.
Note: I attached the cluster log for reference
yesterday - last edited yesterday
Hello @sandeepsuresh16 ,
Below are the answers to your questions:
The error "Could not reach driver of cluster <cluster-id>" can occur due to several different reasons. Use the following troubleshooting steps to verify that the cause of your error matches any of the below:
Move from F-series (compute-optimized) to a memory-optimized driver (e.g., E/D-series) or at least a larger F node. Bump spark.driver.memory via node type, not just conf. Reduce collect()/toPandas() and any driver-side loops/UDF work
If you launch many notebooks/tasks at once, raise the REPL launch timeout (Jobs→Compute→Spark config):
Please do let me know if you have any further questions
Thanks
14 hours ago
Hello Anudeep,
Thank you for your detailed response and the helpful recommendations.
I would like to provide some additional context:
Regarding your suggestion about changing to a memory-optimized driver series, thank you for the recommendation — we will definitely consider this option.
Please let me know if there are any additional logs or metrics you would recommend checking in this specific scenario.
Thanks & Regards,
Sandeep
14 hours ago - last edited 14 hours ago
Hi @sandeepsuresh16 ,
Check two below articles. In one of them they suggested metrics to check. Also, you will find there some suggestions on how to limit the occurrence of this problem.
Workflows are failing with a 'Could not reach driver of the cluster' error - Databricks
Job run fails with error message “Could not reach driver of cluster” - Databricks
12 hours ago
You can follow the recommendations and also check the KB articles mentioned below by @szymon_dybczak. I think those should help you
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now