cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

cluster start Issues

User16826994223
Honored Contributor III

Some of the Jobs are failing in prod with below error message:

Can you please check and let us know the reason for this? These are running under pool cluster.

Run result unavailable: job failed with error message

Unexpected failure while waiting for the cluster (0604-056775-teaks96) to be ready.Cause Unexpected state for cluster (0604-056775-teaks96): UNEXPECTED_LAUNCH_FAILURE(SERVICE_FAULT): databricks_error_message:Encountered unexpected failure on instance InstanceId(63901da48df74d539b078907d929527d), failure code: DEFUNCT_RESOURCE

message: "Defunct Resource Detected"

1 ACCEPTED SOLUTION

Accepted Solutions

Mooune_DBU
Valued Contributor

@Kunal Gaurav​ , This status code only occurs in one of two conditions:

  1. We’re able to request the instances for the cluster but can’t bootstrap them in time
  2. We setup the containers on each instance, but can’t start the containers in time

this is an edge case in our service cleanup logic that some containers/clusters might be mis-identified as zombie resources, but there is actually no problem at all, we are working on optimizing the classifying logic and should have a fix deployed soon.

That being said don't hesitate to create an Engineering Support ticket with the workspace id, cluser id and region so databricks can confirm if something similar has happened in the region.

View solution in original post

1 REPLY 1

Mooune_DBU
Valued Contributor

@Kunal Gaurav​ , This status code only occurs in one of two conditions:

  1. We’re able to request the instances for the cluster but can’t bootstrap them in time
  2. We setup the containers on each instance, but can’t start the containers in time

this is an edge case in our service cleanup logic that some containers/clusters might be mis-identified as zombie resources, but there is actually no problem at all, we are working on optimizing the classifying logic and should have a fix deployed soon.

That being said don't hesitate to create an Engineering Support ticket with the workspace id, cluser id and region so databricks can confirm if something similar has happened in the region.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group