Databricks

User16826988699 · ‎02-01-2022

Error: Please check network connectivity from the data plane to the control plane.

{

"reason": {

"code": "BOOTSTRAP_TIMEOUT",

"parameters": {

"databricks_error_message": "[id: InstanceId(i-0457092c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1118642488485491-ec0d76eb-7d7f-4589), lastStatusChangeTime: 1643738776621, groupIdOpt None,requestIdOpt Some(0201-162313-sfd3cke4),version 0] with threshold 700 seconds timed out after 701101 milliseconds. Please check network connectivity from the data plane to the control plane.",

"instance_id": "i-0457092d46c635b7a"

}

User16725394280 · ‎02-09-2022

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.

AWS console clears the references of terminated clusters after that.

Please find below doc on how to collect system logs,

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the left navigation pane, choose Instances, and select the instance using the instance-id (the instance-id, which starts with i-xxxxxx, will be printed in the “Event Log” section of the cluster details page in Databricks workspace. Notice that the instance must be terminated within last hour, otherwise it will not show up in the list. If the cluster creation failure happened a long time ago, please restart the cluster to reproduce the error first.)

3. Choose Actions > Monitor and troubleshoot > Get System Log.

Here i have taken eu-west -1 as sample you can change according to your region.

# verify access to the webapp

nc -zv ireland.cloud.databricks.com 443

# verify access to the tunnel

nc -zv tunnel.eu-west-1.cloud.databricks.com 443

# verify S3 global and regional access

nc -zv s3.amazonaws.com 443

nc -zv s3.eu-west-1.amazonaws.com 443

# verify STS global and regional access

nc -zv sts.amazonaws.com 443

nc -zv sts.eu-west-1.amazonaws.com 443

# verify regional kinesis access

nc -zv kinesis.eu-west-1.amazonaws.com 443

# verify metastore access

nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306

# control plane infra CIDR range check (verify with docs page for ip range)

nc -uzv 3.250.244.112 443

please go through the below documents too

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#...

https://docs.databricks.com/administration-guide/account-api/aws-storage.html

View solution in original post

Ravi · ‎02-01-2022

Since you are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace, you need to make changes in your AWS network config. If your workspace is configured with customer-managed VPC, then please check if routes are valid, NAT gateway, and IGW are configured correctly as well. To further troubleshoot, you can deploy an EC2 instance in the Databricks data plane subnet and try to reach internet.

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html

User16725394280 · ‎02-09-2022