User16725394280
Databricks Employee
Databricks Employee

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.

AWS console clears the references of terminated clusters after that.

Please find below doc on how to collect system logs,

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the left navigation pane, choose Instances, and select the instance using the instance-id (the instance-id, which starts with i-xxxxxx, will be printed in the “Event Log” section of the cluster details page in Databricks workspace. Notice that the instance must be terminated within last hour, otherwise it will not show up in the list. If the cluster creation failure happened a long time ago, please restart the cluster to reproduce the error first.)

3. Choose Actions > Monitor and troubleshoot > Get System Log.

Here i have taken eu-west -1 as sample you can change according to your region.

# verify access to the webapp 

nc -zv ireland.cloud.databricks.com 443

# verify access to the tunnel 

nc -zv tunnel.eu-west-1.cloud.databricks.com 443

# verify S3 global and regional access

nc -zv s3.amazonaws.com 443

nc -zv s3.eu-west-1.amazonaws.com 443

# verify STS global and regional access

nc -zv sts.amazonaws.com 443

nc -zv sts.eu-west-1.amazonaws.com 443

# verify regional kinesis access 

nc -zv kinesis.eu-west-1.amazonaws.com 443

# verify metastore access 

nc -zv md15cf9e1wmjgny.cxg30ia2wqgj.eu-west-1.rds.amazonaws.com 3306

# control plane infra CIDR range check (verify with docs page for ip range)

nc -uzv 3.250.244.112 443

please go through the below documents too

https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#...

https://docs.databricks.com/administration-guide/account-api/aws-storage.html

View solution in original post