cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Bootstrap Timeout during cluster start on AWS cloud

Manimkm08
New Contributor III

Sometimes am getting the below error when the cluster is started. Have attached the system log of the below mentioned instance from AWS. In recent days am getting this error for very frequently. Have seen same error is reported earlier and it marked as resolved by the team.

Error:

{

"reason": {

"code": "BOOTSTRAP_TIMEOUT",

"parameters": {

"databricks_error_message": "[id: InstanceId(i-0a2c7c58a6ffcb69f), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-1009608583880808-17b7de8b-3026-44d9-ad08-2cc4776d9067), lastStatusChangeTime: 1667548831248, groupIdOpt Some(-8704258090982298271),requestIdOpt Some(1104-080005-ioy9s8op-62e7ec83-2289-45b8-8),version 2] with threshold 700 seconds timed out after 700519 milliseconds. Please check network connectivity from the data plane to the control plane.",

"instance_id": "i-0a2c7c58a6ffcb69f"

}

}

}

4 REPLIES 4

karthik_p
Esteemed Contributor

@Mani Sriniโ€‹ Is this error you are seeing is new environment that was configured or it's working. any changes happened in cloud vpc end. looks data plane to control plane connectivity issues, please make sure all required ports are opened to communicate between data bricks control plane and customer data plane

Manimkm08
New Contributor III

@karthik pโ€‹ @Kaniz Fatmaโ€‹ Didn't see the error on the past three days and didn't do any changes as well on my end. Am wondering why it's happened earlier and not in recent days. Can you please guide me what are the required ports need to be opened to avoid this issue in future.

Manimkm08
New Contributor III

@Kaniz Fatmaโ€‹ @karthik pโ€‹  Since morning, am facing the issue again. Seems the issue is intermittent and it fails the pipeline in mid of the ETL process. Couldn't able to get the exact root cause of the issue. Can someone provide what would be the workaround or solution for it.

karthik_p
Esteemed Contributor

@Mani Sriniโ€‹ mostly this kind issue, you can clear by validating security groups and routine table config for VPC that has been used for Databricks Instance. please validate if all pre-requisites in this has been met. mainly please check sunnet route table config and security groups Customer-managed VPC | Databricks on AWS

imageimage

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group