cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Sahha_Krishna
New Contributor

Unable to start the Cluster in AWS-hosted Databricks because of the below reason

{
  "reason": {
    "code": "BOOTSTRAP_TIMEOUT",
    "parameters": {
      "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1653812411865398-41432fd9-c426-43ee-84fd-f161341ac1db), lastStatusChangeTime: 1690489336310, groupIdOpt Some(0),requestIdOpt Some(0717-085009-xgjqxcqe-70e3f1a0-0524-43cc-b),version 1] with threshold 700 seconds timed out after 701058 milliseconds. Please check network connectivity from the data plane to the control plane.",
      "instance_id": "i-0634ee9c2d420edc8"
    }
  },
  "add_node_failure_details": {
    "failure_count": 2,
    "resource_type": "container",
    "will_retry": false
  }
}  

It all started when we arranged a Peering connection between Databricks default VPC and out VPC. Rolled back all the changes, but the problem still persists. In AWS, I can see the EC2 instances are initialized and running but something wrong other than that.

 Any help would be greatly appreciated.

1 REPLY 1

User16539034020
Databricks Employee
Databricks Employee

Hi, Sahha:

Thanks for contacting Databricks Support. 

This is the common type of error, which indicates that the bootstrap failed due to a misconfigured data plane network. Databricks requested EC2 instances for a new cluster, but encountered a long delay while waiting for the EC2 instance to bootstrap, and connect to the control plane. The cluster manager terminates the instances, and reports this error.

Please go to AWS console and download the EC2 system log by following the instructions: 

  1. Open the Amazon EC2 console 
  2. In the left navigation pane, choose Instances, and select the instance using the instance ID.
    The instance ID, which starts with i-xxxxxx, will be printed in the Event Log section of the cluster details page. Note that the instance must be terminated within the last hour; otherwise, it will not show up in the list. If the cluster creation failure happened a long time ago, restart the cluster to reproduce the error first.
  3. Choose Actions > Monitor and troubleshoot > Get System Log.
  4. Click the Download button to download the system log. It may take a few minutes for the system log to show up if the cluster is just started.

Check the system log, look for messages starting with the prefix: [timestamp, Bootstrap Event]. Search for FAILED_MESSAGE, and use a Base64 decode tool to decode the message. The message should give the reason why bootstrap failed.

Regards,

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group