cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Bootstrap Timeout: DURING CLUSTER START

Avinash_Narala
New Contributor III

Hi,

When I start a cluster, I am getting below error:

Bootstrap Timeout:

[id: InstanceId(i-05bbcfbb30027ce2c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-2247916891060257-01b40fb4-3eb1-4a26-99b4-30d6aa0bfe83), lastStatusChangeTime: 1709723780222, groupIdOpt Some(0),requestIdOpt Some(0228-090433-7f92xtaj-0dcafb3c-92e1-455f-9),version 1] with threshold 700 seconds timed out after 705482 milliseconds. Please check network connectivity from the data plane to the control plane.

I tried below approaches:

  1.  Reviewed the EC-2 logs in AWS for more details about the error-  kauditd hold queue overflow
  2. checked subnet route table config and security groups  in AWS and found that no changes have been made.

But still the cluster isn't spinning up.

Can you please help me with this.

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Avinash_NaralaThe Bootstrap Timeout issue during cluster start can be quite frustrating, but let’s troubleshoot it together. Here are some steps you can take to address this problem:

  1. Network Connectivity Check:

    • As the error message suggests, verify the network connectivity between the data plane and the control plane. Ensure that there are no network issues causing delays or timeouts.
    • Double-check the security groups and subnet route table configurations for the VPC used by your Databricks instance. Make sure they meet the prerequisites.
    • If you’re using AWS, ensure that the necessary routes and security group rules are correctly configured.
  2. EC2 Logs Investigation:

    • Review the EC2 logs in AWS for more details about the error. Specifically, look for any indications related to kauditd hold queue overflow.
    • Investigate any other relevant logs or events that might shed light on the issue.
  3. Cluster Initialization Delay:

    • Sometimes, clusters experience delays during initialization due to various factors. It’s possible that the instance is taking longer than expected to start.
    • If the issue persists, consider reaching out to your cloud provider’s support team for further assistance.
  4. Upcoming Fix:

Stay patient, and hopefully, the upcoming fix will resolve the Bootstrap Timeout problem for you! 🚀

 

dhtubong
New Contributor II

Hello,

We encountered bootstrap timeout issue while spinning up an AWS cluster in Community edition for the last few days.

"zone_id": "us-west-2c"

Bootstrap Timeout:
Node daemon ping timeout in 780000 ms for instance i-00f21ee2d3ca61424 @ 10.172.245.1. Please check network connectivity between the data plane and the control plane.

dhtubong
New Contributor II

Hello - if you're using DB Community Edition and having Bootstrap Timeout issue, then below resolution may help.

Error: Bootstrap Timeout:
Node daemon ping timeout in 780000 ms for instance i-00f21ee2d3ca61424 @ 10.172.245.1. Please check network connectivity between the data plane and the control plane.

Resolution: Open InPrivate Window (MSFT) or New Incognito window (GOOGL).

Best..

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.