cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Customer-managed VPC with empty default security group

amitca71
Contributor II

Hi,

I use self managed VPC. when i use security group the jobs clusters are being created.

if i make the default security group empty and create customized security group (required by soc2 to have empty default), with same definition of ingress (allow all port and protocol within the security group) and egress (allow all traffic to 0.0.0.0).

using this terraform:

resource "aws_security_group" "databricks_infrastructure_sg" {

name = "databricks_infra_sg"

description = "internal ingress"

vpc_id = module.vpc.vpc_id

ingress {

description = "Allow all internal TCP and UDP"

from_port = 0

to_port = 65535

protocol = "All"

cidr_blocks = [module.vpc.vpc_cidr_block]

self = true

}

egress {

from_port = 0

to_port = 65535

protocol = "All"

cidr_blocks = ["0.0.0.0/0"]

}

i get the following error on the job tasks:

 Unexpected failure while waiting for the cluster Some((01xxxxxx) )to be readySome(: Cluster 0131-154653-qtv0d3wx is in unexpected state Terminated: BOOTSTRAP_TIMEOUT(SUCCESS)databricks_error_message:[id: InstanceId(i-07xxxx501), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-20xxxxxx2-xxxx-***-4292-xx-xxxx), lastStatusChangeTime: 1675180073933, groupIdOpt Some(0),requestIdOpt Some(xxxxxxxxx),version 1] with threshold 700 seconds timed out after 700726 milliseconds. Please check network connectivity from the data plane to the control plane.,instance_id:i-xxxxxxxx.)

I couldnt see any documentation that states something about it.

any idea?

Thanks,

Amit

6 REPLIES 6

Debayan
Esteemed Contributor III

Hi, this is a typical network configuration error and the custom security group has to be reverified with the rules on place, please refer to this and let us know if this helps.

https://community.databricks.com/s/question/0D53f00001fR8LGCA0/problem-with-spinning-up-a-cluster-on...

amitca71
Contributor II

hi @Debayan Mukherjee​ i was adding the security group to mws. now i get different error:

{

"reason": {

"code": "SECURITY_DAEMON_REGISTRATION_EXCEPTION",

"type": "CLIENT_ERROR",

"parameters": {

"instance_id": "i-04ef78a9000a86819",

"databricks_error_message": "Failed to set up the Spark container due to an error when registering the container to security daemon."

}

}

}

thanks,

Amit

Debayan
Esteemed Contributor III

Hi Amit, Please confirm if you have checked the SG rules and configuration.

amitca71
Contributor II

It started working.... I guess there was temporary issue within aws??

Thanks @Debayan Mukherjee​ ​

Debayan
Esteemed Contributor III

Yes, possible. Thanks for your confirmation.

jose_gonzalez
Moderator
Moderator

Hi @Amit Cahanovich​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group