cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Sahha_Krishna
New Contributor

Unable to start the Cluster in AWS-hosted Databricks because of the below reason

{
  "reason": {
    "code": "BOOTSTRAP_TIMEOUT",
    "parameters": {
      "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1653812411865398-41432fd9-c426-43ee-84fd-f161341ac1db), lastStatusChangeTime: 1690489336310, groupIdOpt Some(0),requestIdOpt Some(0717-085009-xgjqxcqe-70e3f1a0-0524-43cc-b),version 1] with threshold 700 seconds timed out after 701058 milliseconds. Please check network connectivity from the data plane to the control plane.",
      "instance_id": "i-0634ee9c2d420edc8"
    }
  },
  "add_node_failure_details": {
    "failure_count": 2,
    "resource_type": "container",
    "will_retry": false
  }
}  

It all started when we arranged a Peering connection between Databricks default VPC and out VPC. Rolled back all the changes, but the problem still persists. In AWS, I can see the EC2 instances are initialized and running but something wrong other than that.

 Any help would be greatly appreciated.

1 REPLY 1

User16539034020
Contributor II
Contributor II

Hi, Sahha:

Thanks for contacting Databricks Support. 

This is the common type of error, which indicates that the bootstrap failed due to a misconfigured data plane network. Databricks requested EC2 instances for a new cluster, but encountered a long delay while waiting for the EC2 instance to bootstrap, and connect to the control plane. The cluster manager terminates the instances, and reports this error.

Please go to AWS console and download the EC2 system log by following the instructions: 

  1. Open the Amazon EC2 console 
  2. In the left navigation pane, choose Instances, and select the instance using the instance ID.
    The instance ID, which starts with i-xxxxxx, will be printed in the Event Log section of the cluster details page. Note that the instance must be terminated within the last hour; otherwise, it will not show up in the list. If the cluster creation failure happened a long time ago, restart the cluster to reproduce the error first.
  3. Choose Actions > Monitor and troubleshoot > Get System Log.
  4. Click the Download button to download the system log. It may take a few minutes for the system log to show up if the cluster is just started.

Check the system log, look for messages starting with the prefix: [timestamp, Bootstrap Event]. Search for FAILED_MESSAGE, and use a Base64 decode tool to decode the message. The message should give the reason why bootstrap failed.

Regards,

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.