cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Bootstrap timeout on instance creation

tjkentSI
New Contributor

I am getting the following error...

{
  "reason": {
    "code": "BOOTSTRAP_TIMEOUT",
    "parameters": {
      "databricks_error_message": "[id: InstanceId(i-0e552e85c37c9da2d), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1266184353274328-a3cf37c9-8da9-4828-8cff-aa094025aad5), lastStatusChangeTime: 1689896414325, groupIdOpt Some(0),requestIdOpt Some(0720-205452-ixn0b8ac-78df74c0-44b8-4eed-9),version 2] with threshold 700 seconds timed out after 708057 milliseconds. Please check network connectivity from the data plane to the control plane.",
      "instance_id": "i-0e552e85c37c9da2d"
    }
  },
  "add_node_failure_details": {
    "failure_count": 1,
    "resource_type": "container",
    "will_retry": false
  }
}

This is from the system log for that instance.

[Bootstrap Event] DNS output for databricks-prod-artifacts-us-east-1.s3.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-prod-artifacts-us-east-1.s3.amazonaws.com	canonical name = s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com	canonical name = s3-w.us-east-1.amazonaws.com.
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.164.81
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.197.9
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.236.193
Name:	s3-w.us-east-1.amazonaws.com
Address: 54.231.160.105
Name:	s3-w.us-east-1.amazonaws.com
Address: 54.231.166.201
Name:	s3-w.us-east-1.amazonaws.com
Address: 3.5.8.156
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.216.112.83
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.112.113

[Bootstrap Event] Can reach databricks-prod-artifacts-us-east-1.s3.amazonaws.com: [FAILED]
[Bootstrap Event] DNS output for databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com	canonical name = s3-r-w.us-west-2.amazonaws.com.
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.192.26
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.194.242
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.196.74
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.209.98
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.192.177
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.82.203
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.128.130
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.138.58

[  252.523636] audit: kauditd hold queue overflow
[  252.598843] audit: kauditd hold queue overflow
[  252.639047] audit: kauditd hold queue overflow
[Bootstrap Event] Can reach databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com: [FAILED]
[Bootstrap Event] DNS output for databricks-update-oregon.s3.us-west-2.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-update-oregon.s3.us-west-2.amazonaws.com	canonical name = s3-r-w.us-west-2.amazonaws.com.
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.133.106
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.153.81
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.196.49
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.80.138
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.84.111
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.195.66
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.210.114
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.240.218

[Bootstrap Event] Can reach databricks-update-oregon.s3.us-west-2.amazonaws.com: [FAILED]


I was not able to ssh into the instance that was being created, so I started a new instance in the same security group.

 

 

ubuntu@ip-10-10-81-10:~$ nc -vz databricks-update-oregon.s3.us-west-2.amazonaws.com 443
Connection to databricks-update-oregon.s3.us-west-2.amazonaws.com (52.218.179.122) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -vz databricks-prod-artifacts-us-east-1.s3.amazonaws.com 443
Connection to databricks-prod-artifacts-us-east-1.s3.amazonaws.com (54.231.128.137) 443 port [tcp/https] succeeded!

ubuntu@ip-10-10-81-10:~$ nc -zv ireland.cloud.databricks.com 443
Connection to ireland.cloud.databricks.com (3.250.244.127) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv tunnel.eu-west-1.cloud.databricks.com 443
Connection to tunnel.eu-west-1.cloud.databricks.com (3.250.244.114) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv s3.amazonaws.com 443
Connection to s3.amazonaws.com (52.217.142.136) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv s3.eu-west-1.amazonaws.com 443
Connection to s3.eu-west-1.amazonaws.com (52.92.17.176) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv sts.amazonaws.com 443
Connection to sts.amazonaws.com (54.239.29.25) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv sts.eu-west-1.amazonaws.com 443
Connection to sts.eu-west-1.amazonaws.com (54.239.32.126) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv kinesis.eu-west-1.amazonaws.com 443
Connection to kinesis.eu-west-1.amazonaws.com (99.80.34.206) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306
Connection to md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com (54.73.70.178) 3306 port [tcp/mysql] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -uzv 3.250.244.112 443
Connection to 3.250.244.112 443 port [udp/https] succeeded!

 


I additionally tried to allow for all TCP and UDP connections inbound/outbound for the security group and that failed as well. Looking for guidance on how I can deep dive this issue. 

Thanks.

1 REPLY 1

User16539034020
Contributor II
Contributor II

Hello, 

Thanks for contacting Databricks Support. 

From the error message: [Bootstrap Event] Can reach databricks-prod-artifacts-us-east-1.s3.amazonaws.com: [FAILED]. It suggests an issue with reaching a Databricks-related AWS S3 bucket from your environment. The DNS output for databricks-update-oregon.s3.us-west-2.amazonaws.com indicates that a DNS server at 10.10.0.2 is being queried. This type of issue can arise due to network configuration or connectivity problems.

You were able to establish a successful connection to databricks-update-oregon.s3.us-west-2.amazonaws.com on port 443 (HTTPS) from the new instance you created in the same security group. Considering this, we need to address the initial issue of not being able to SSH into the first instance:

  • Since the new instance is accessible, compare its configuration with the first instance to identify any discrepancies.
  • Verify that the route table associated with the subnet allows outbound traffic and has the appropriate routes for inbound SSH traffic.
  • Check the network ACLs for rules that might be blocking inbound SSH traffic.
  • Check if DNS hostname is enabled in the VPC.

Regards, 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.