Databricks Community

apurvasawant · ‎08-04-2025

I'm encountering a failure while running a job in Databricks. The run fails with the following error message:

Cluster became unreachable during run Cause: requirement failed: Execution is done
Details:

Runtime version: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Workers : 4

Any retry attempts? - No

Observed behavior: The job appears to start normally but fails shortly afterward with the above message. No specific error is shown in the logs except the generic “Execution is done” message.

Has anyone else faced this issue? What could be the root cause, and how can I avoid it?

Thanks in advance!

mmayorga · ‎09-17-2025

Hello @apurvasawant

I'm sorry you are seeing this behavior while using Jobs. Definitely, these messages don't help much.

When this happens, I suggest taking a step back and reviewing the configuration of your Job and some troubleshooting:

What is the Task Type that is causing this exception? A Notebook, Python, or Scala code?

If so, are you able to capture how much progress this task is making?
Perhaps this task is overloading the driver; for this, consider leveraging distributed tasks across your cluster nodes.
Validate if your cluster can handle a simplified version of your task and incrementally add more coverage until you find what is actually causing the problem in your task.

What is the cluster configured for this task?

Is it an interactive cluster? Or a Job Cluster?
What runtime, memory, and cores is your cluster configured for? Is it capable of handling your data volume and processing needs?
Have you tried using serverless?

What is the profile of the data that will be handled for the Task?

You may want to start with a small portion of the total volume of data and incrementally add more until you find the culprit.

Monitoring:

You may want to export the job run logs to get more details about the execution of your tasks - https://docs.databricks.com/aws/en/jobs/monitor#export-job-run-logs
Leverage the Timeline View to see what operations your task is executing - https://docs.databricks.com/aws/en/jobs/monitor#timeline-view

Unfortunately, I'm not able to provide a solid answer for your problem, but hopefully, these questions will provide different perspectives and considerations toward the design of your job and all components surrounding your task.

Thank you

Databricks Community

Job Run Failed - "Cluster became unreachable during run" with Cause: "requirement failed: Execution

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples