cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Customer-managed VPC with empty default security group

amitca71
Contributor II

Hi,

I use self managed VPC. when i use security group the jobs clusters are being created.

if i make the default security group empty and create customized security group (required by soc2 to have empty default), with same definition of ingress (allow all port and protocol within the security group) and egress (allow all traffic to 0.0.0.0).

using this terraform:

resource "aws_security_group" "databricks_infrastructure_sg" {

name = "databricks_infra_sg"

description = "internal ingress"

vpc_id = module.vpc.vpc_id

ingress {

description = "Allow all internal TCP and UDP"

from_port = 0

to_port = 65535

protocol = "All"

cidr_blocks = [module.vpc.vpc_cidr_block]

self = true

}

egress {

from_port = 0

to_port = 65535

protocol = "All"

cidr_blocks = ["0.0.0.0/0"]

}

i get the following error on the job tasks:

 Unexpected failure while waiting for the cluster Some((01xxxxxx) )to be readySome(: Cluster 0131-154653-qtv0d3wx is in unexpected state Terminated: BOOTSTRAP_TIMEOUT(SUCCESS)databricks_error_message:[id: InstanceId(i-07xxxx501), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-20xxxxxx2-xxxx-***-4292-xx-xxxx), lastStatusChangeTime: 1675180073933, groupIdOpt Some(0),requestIdOpt Some(xxxxxxxxx),version 1] with threshold 700 seconds timed out after 700726 milliseconds. Please check network connectivity from the data plane to the control plane.,instance_id:i-xxxxxxxx.)

I couldnt see any documentation that states something about it.

any idea?

Thanks,

Amit

6 REPLIES 6

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, this is a typical network configuration error and the custom security group has to be reverified with the rules on place, please refer to this and let us know if this helps.

https://community.databricks.com/s/question/0D53f00001fR8LGCA0/problem-with-spinning-up-a-cluster-on...

amitca71
Contributor II

hi @Debayan Mukherjee​ i was adding the security group to mws. now i get different error:

{

"reason": {

"code": "SECURITY_DAEMON_REGISTRATION_EXCEPTION",

"type": "CLIENT_ERROR",

"parameters": {

"instance_id": "i-04ef78a9000a86819",

"databricks_error_message": "Failed to set up the Spark container due to an error when registering the container to security daemon."

}

}

}

thanks,

Amit

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi Amit, Please confirm if you have checked the SG rules and configuration.

amitca71
Contributor II

It started working.... I guess there was temporary issue within aws??

Thanks @Debayan Mukherjee​ ​

Debayan
Esteemed Contributor III
Esteemed Contributor III

Yes, possible. Thanks for your confirmation.

jose_gonzalez
Moderator
Moderator

Hi @Amit Cahanovich​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.