cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Job Run Failed - "Cluster became unreachable during run" with Cause: "requirement failed: Execution

apurvasawant
New Contributor II

I'm encountering a failure while running a job in Databricks. The run fails with the following error message:

Cluster became unreachable during run Cause: requirement failed: Execution is done
Details:

Runtime version: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Workers : 4

Any retry attempts? - No

Observed behavior: The job appears to start normally but fails shortly afterward with the above message. No specific error is shown in the logs except the generic โ€œExecution is doneโ€ message.

Has anyone else faced this issue? What could be the root cause, and how can I avoid it?

Thanks in advance!

1 REPLY 1

mmayorga
Databricks Employee
Databricks Employee

Hello @apurvasawant 

I'm sorry you are seeing this behavior while using Jobs. Definitely, these messages don't help much.

When this happens, I suggest taking a step back and reviewing the configuration of your Job and some troubleshooting:

  • What is the Task Type that is causing this exception? A Notebook, Python, or Scala code?
    • If so, are you able to capture how much progress this task is making?
    • Perhaps this task is overloading the driver; for this, consider leveraging distributed tasks across your cluster nodes. 
    • Validate if your cluster can handle a simplified version of your task and incrementally add more coverage until you find what is actually causing the problem in your task.
  • What is the cluster configured for this task?
    • Is it an interactive cluster? Or a Job Cluster?
    • What runtime, memory, and cores is your cluster configured for? Is it capable of handling your data volume and processing needs?
    • Have you tried using serverless?
  • What is the profile of the data that will be handled for the Task?
    • You may want to start with a small portion of the total volume of data and incrementally add more until you find the culprit.
  • Monitoring:

 

Unfortunately, I'm not able to provide a solid answer for your problem, but hopefully, these questions will provide different perspectives and considerations toward the design of your job and all components surrounding your task.

Thank you

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now