Bootstrap timeout on instance creation

tjkentSI
New Contributor

I am getting the following error...

{
  "reason": {
    "code": "BOOTSTRAP_TIMEOUT",
    "parameters": {
      "databricks_error_message": "[id: InstanceId(i-0e552e85c37c9da2d), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1266184353274328-a3cf37c9-8da9-4828-8cff-aa094025aad5), lastStatusChangeTime: 1689896414325, groupIdOpt Some(0),requestIdOpt Some(0720-205452-ixn0b8ac-78df74c0-44b8-4eed-9),version 2] with threshold 700 seconds timed out after 708057 milliseconds. Please check network connectivity from the data plane to the control plane.",
      "instance_id": "i-0e552e85c37c9da2d"
    }
  },
  "add_node_failure_details": {
    "failure_count": 1,
    "resource_type": "container",
    "will_retry": false
  }
}

This is from the system log for that instance.

[Bootstrap Event] DNS output for databricks-prod-artifacts-us-east-1.s3.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-prod-artifacts-us-east-1.s3.amazonaws.com	canonical name = s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com	canonical name = s3-w.us-east-1.amazonaws.com.
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.164.81
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.197.9
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.236.193
Name:	s3-w.us-east-1.amazonaws.com
Address: 54.231.160.105
Name:	s3-w.us-east-1.amazonaws.com
Address: 54.231.166.201
Name:	s3-w.us-east-1.amazonaws.com
Address: 3.5.8.156
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.216.112.83
Name:	s3-w.us-east-1.amazonaws.com
Address: 52.217.112.113

[Bootstrap Event] Can reach databricks-prod-artifacts-us-east-1.s3.amazonaws.com: [FAILED]
[Bootstrap Event] DNS output for databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com	canonical name = s3-r-w.us-west-2.amazonaws.com.
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.192.26
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.194.242
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.196.74
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.209.98
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.192.177
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.82.203
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.128.130
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.138.58

[  252.523636] audit: kauditd hold queue overflow
[  252.598843] audit: kauditd hold queue overflow
[  252.639047] audit: kauditd hold queue overflow
[Bootstrap Event] Can reach databricks-prod-artifacts-us-west-2.s3.us-west-2.amazonaws.com: [FAILED]
[Bootstrap Event] DNS output for databricks-update-oregon.s3.us-west-2.amazonaws.com: 
Server:		10.10.0.2
Address:	10.10.0.2#53

Non-authoritative answer:
databricks-update-oregon.s3.us-west-2.amazonaws.com	canonical name = s3-r-w.us-west-2.amazonaws.com.
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.133.106
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.153.81
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.218.196.49
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.80.138
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 3.5.84.111
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.195.66
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.210.114
Name:	s3-r-w.us-west-2.amazonaws.com
Address: 52.92.240.218

[Bootstrap Event] Can reach databricks-update-oregon.s3.us-west-2.amazonaws.com: [FAILED]


I was not able to ssh into the instance that was being created, so I started a new instance in the same security group.

 

 

ubuntu@ip-10-10-81-10:~$ nc -vz databricks-update-oregon.s3.us-west-2.amazonaws.com 443
Connection to databricks-update-oregon.s3.us-west-2.amazonaws.com (52.218.179.122) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -vz databricks-prod-artifacts-us-east-1.s3.amazonaws.com 443
Connection to databricks-prod-artifacts-us-east-1.s3.amazonaws.com (54.231.128.137) 443 port [tcp/https] succeeded!

ubuntu@ip-10-10-81-10:~$ nc -zv ireland.cloud.databricks.com 443
Connection to ireland.cloud.databricks.com (3.250.244.127) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv tunnel.eu-west-1.cloud.databricks.com 443
Connection to tunnel.eu-west-1.cloud.databricks.com (3.250.244.114) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv s3.amazonaws.com 443
Connection to s3.amazonaws.com (52.217.142.136) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv s3.eu-west-1.amazonaws.com 443
Connection to s3.eu-west-1.amazonaws.com (52.92.17.176) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv sts.amazonaws.com 443
Connection to sts.amazonaws.com (54.239.29.25) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv sts.eu-west-1.amazonaws.com 443
Connection to sts.eu-west-1.amazonaws.com (54.239.32.126) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv kinesis.eu-west-1.amazonaws.com 443
Connection to kinesis.eu-west-1.amazonaws.com (99.80.34.206) 443 port [tcp/https] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306
Connection to md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com (54.73.70.178) 3306 port [tcp/mysql] succeeded!
ubuntu@ip-10-10-81-10:~$ nc -uzv 3.250.244.112 443
Connection to 3.250.244.112 443 port [udp/https] succeeded!

 


I additionally tried to allow for all TCP and UDP connections inbound/outbound for the security group and that failed as well. Looking for guidance on how I can deep dive this issue. 

Thanks.