cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with spinning up a cluster on a new workspace

User16826988699
New Contributor

Error: Please check network connectivity from the data plane to the control plane.

{

 "reason": {

   "code": "BOOTSTRAP_TIMEOUT",

   "parameters": {

     "databricks_error_message": "[id: InstanceId(i-0457092c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1118642488485491-ec0d76eb-7d7f-4589), lastStatusChangeTime: 1643738776621, groupIdOpt None,requestIdOpt Some(0201-162313-sfd3cke4),version 0] with threshold 700 seconds timed out after 701101 milliseconds. Please check network connectivity from the data plane to the control plane.",

     "instance_id": "i-0457092d46c635b7a"

   }

 }

}

1 ACCEPTED SOLUTION

Accepted Solutions

User16725394280
Contributor II

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.

AWS console clears the references of terminated clusters after that.

Please find below doc on how to collect system logs,

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the left navigation pane, choose Instances, and select the instance using the instance-id (the instance-id, which starts with i-xxxxxx, will be printed in the “Event Log” section of the cluster details page in Databricks workspace. Notice that the instance must be terminated within last hour, otherwise it will not show up in the list. If the cluster creation failure happened a long time ago, please restart the cluster to reproduce the error first.)

3. Choose Actions > Monitor and troubleshoot > Get System Log.

Here i have taken eu-west -1 as sample you can change according to your region.

# verify access to the webapp 

nc -zv ireland.cloud.databricks.com 443

# verify access to the tunnel 

nc -zv tunnel.eu-west-1.cloud.databricks.com 443

# verify S3 global and regional access

nc -zv s3.amazonaws.com 443

nc -zv s3.eu-west-1.amazonaws.com 443

# verify STS global and regional access

nc -zv sts.amazonaws.com 443

nc -zv sts.eu-west-1.amazonaws.com 443

# verify regional kinesis access 

nc -zv kinesis.eu-west-1.amazonaws.com 443

# verify metastore access 

nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306

# control plane infra CIDR range check (verify with docs page for ip range)

nc -uzv 3.250.244.112 443

please go through the below documents too

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#...

https://docs.databricks.com/administration-guide/account-api/aws-storage.html

View solution in original post

2 REPLIES 2

Ravi
Valued Contributor
Valued Contributor

Since you are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace, you need to make changes in your AWS network config. If your workspace is configured with customer-managed VPC, then please check if routes are valid, NAT gateway, and IGW are configured correctly as well. To further troubleshoot, you can deploy an EC2 instance in the Databricks data plane subnet and try to reach internet.

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html

User16725394280
Contributor II

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.

AWS console clears the references of terminated clusters after that.

Please find below doc on how to collect system logs,

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the left navigation pane, choose Instances, and select the instance using the instance-id (the instance-id, which starts with i-xxxxxx, will be printed in the “Event Log” section of the cluster details page in Databricks workspace. Notice that the instance must be terminated within last hour, otherwise it will not show up in the list. If the cluster creation failure happened a long time ago, please restart the cluster to reproduce the error first.)

3. Choose Actions > Monitor and troubleshoot > Get System Log.

Here i have taken eu-west -1 as sample you can change according to your region.

# verify access to the webapp 

nc -zv ireland.cloud.databricks.com 443

# verify access to the tunnel 

nc -zv tunnel.eu-west-1.cloud.databricks.com 443

# verify S3 global and regional access

nc -zv s3.amazonaws.com 443

nc -zv s3.eu-west-1.amazonaws.com 443

# verify STS global and regional access

nc -zv sts.amazonaws.com 443

nc -zv sts.eu-west-1.amazonaws.com 443

# verify regional kinesis access 

nc -zv kinesis.eu-west-1.amazonaws.com 443

# verify metastore access 

nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306

# control plane infra CIDR range check (verify with docs page for ip range)

nc -uzv 3.250.244.112 443

please go through the below documents too

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#...

https://docs.databricks.com/administration-guide/account-api/aws-storage.html

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.