โ01-31-2023 10:12 AM
Hi,
I use self managed VPC. when i use security group the jobs clusters are being created.
if i make the default security group empty and create customized security group (required by soc2 to have empty default), with same definition of ingress (allow all port and protocol within the security group) and egress (allow all traffic to 0.0.0.0).
using this terraform:
resource "aws_security_group" "databricks_infrastructure_sg" {
name = "databricks_infra_sg"
description = "internal ingress"
vpc_id = module.vpc.vpc_id
ingress {
description = "Allow all internal TCP and UDP"
from_port = 0
to_port = 65535
protocol = "All"
cidr_blocks = [module.vpc.vpc_cidr_block]
self = true
}
egress {
from_port = 0
to_port = 65535
protocol = "All"
cidr_blocks = ["0.0.0.0/0"]
}
i get the following error on the job tasks:
Unexpected failure while waiting for the cluster Some((01xxxxxx) )to be readySome(: Cluster 0131-154653-qtv0d3wx is in unexpected state Terminated: BOOTSTRAP_TIMEOUT(SUCCESS)databricks_error_message:[id: InstanceId(i-07xxxx501), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-20xxxxxx2-xxxx-***-4292-xx-xxxx), lastStatusChangeTime: 1675180073933, groupIdOpt Some(0),requestIdOpt Some(xxxxxxxxx),version 1] with threshold 700 seconds timed out after 700726 milliseconds. Please check network connectivity from the data plane to the control plane.,instance_id:i-xxxxxxxx.)
I couldnt see any documentation that states something about it.
any idea?
Thanks,
Amit
โ01-31-2023 11:18 PM
Hi, this is a typical network configuration error and the custom security group has to be reverified with the rules on place, please refer to this and let us know if this helps.
โ02-01-2023 01:48 AM
hi @Debayan Mukherjeeโ i was adding the security group to mws. now i get different error:
{
"reason": {
"code": "SECURITY_DAEMON_REGISTRATION_EXCEPTION",
"type": "CLIENT_ERROR",
"parameters": {
"instance_id": "i-04ef78a9000a86819",
"databricks_error_message": "Failed to set up the Spark container due to an error when registering the container to security daemon."
}
}
}
thanks,
Amit
โ02-01-2023 09:48 PM
Hi Amit, Please confirm if you have checked the SG rules and configuration.
โ02-11-2023 02:39 AM
It started working.... I guess there was temporary issue within aws??
Thanks @Debayan Mukherjeeโ โ
โ02-12-2023 09:38 PM
Yes, possible. Thanks for your confirmation.
โ02-23-2023 02:31 PM
Hi @Amit Cahanovichโ,
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group