Databricks Community

Sahha_Krishna · ‎07-27-2023

Unable to start the Cluster in AWS-hosted Databricks because of the below reason

{
  "reason": {
    "code": "BOOTSTRAP_TIMEOUT",
    "parameters": {
      "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1653812411865398-41432fd9-c426-43ee-84fd-f161341ac1db), lastStatusChangeTime: 1690489336310, groupIdOpt Some(0),requestIdOpt Some(0717-085009-xgjqxcqe-70e3f1a0-0524-43cc-b),version 1] with threshold 700 seconds timed out after 701058 milliseconds. Please check network connectivity from the data plane to the control plane.",
      "instance_id": "i-0634ee9c2d420edc8"
    }
  },
  "add_node_failure_details": {
    "failure_count": 2,
    "resource_type": "container",
    "will_retry": false
  }
}

It all started when we arranged a Peering connection between Databricks default VPC and out VPC. Rolled back all the changes, but the problem still persists. In AWS, I can see the EC2 instances are initialized and running but something wrong other than that.

Any help would be greatly appreciated.

Harrison_S · ‎09-22-2023

Hi Sahha,

It may be a DNS issue if that wasn't rolled back, can you check the documentation on troubleshooting guide and see if these configurations were rolled back as well? https://docs.databricks.com/en/administration-guide/cloud-configurations/aws/vpc-peering.html#troubl...