03-10-2022 03:28 AM
We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with many restrictions.
We’re able to start EC2 instance and pull image, but there must be some other blocker. I think ec2 instance is succefully running, but I have error in databricks
Cluster terminated.Reason:Container launch failure
An unexpected error was encountered while launching containers on worker instances for the cluster. Please retry and contact Databricks if the problem persists.
Instance ID: i-0fb50653895453fdf
Internal error message: Failed to launch spark container on instance i-0fb50653895453fdf. Exception: Container setup has timed out
It should be in some settings/permissions inside client's environment.
Here is end of ec2 log
-----END SSH HOST KEY KEYS----- [ 59.876874] cloud-init[1705]: Cloud-init v. 21.4-0ubuntu1~18.04.1 running 'modules:final' at Wed, 09 Mar 2022 15:05:30 +0000. Up 17.38 seconds. [ 59.877016] cloud-init[1705]: Cloud-init v. 21.4-0ubuntu1~18.04.1 finished at Wed, 09 Mar 2022 15:06:13 +0000. Datasource DataSourceEc2Local. Up 59.86 seconds [ 59.819059] audit: kauditd hold queue overflow [
66.068641] audit: kauditd hold queue overflow [ 66.070755] audit: kauditd hold queue overflow [ 66.072833] audit: kauditd hold queue overflow [ 74.733249] audit: kauditd hold queue overflow [
74.735227] audit: kauditd hold queue overflow [ 74.737109] audit: kauditd hold queue overflow [ 79.899966] audit: kauditd hold queue overflow [ 79.903557] audit: kauditd hold queue overflow [
79.907108] audit: kauditd hold queue overflow [ 89.324990] audit: kauditd hold queue overflow [ 89.329193] audit: kauditd hold queue overflow [ 89.333125] audit: kauditd hold queue overflow [ 106.617320] audit: kauditd hold queue overflow [ 106.620980] audit: kauditd hold queue overflow [ 107.464865] audit: kauditd hold queue overflow [ 127.175767] audit: kauditd hold queue overflow [ 127.179897] audit: kauditd hold queue overflow [ 127.215281] audit: kauditd hold queue overflow [ 132.190357] audit: kauditd hold queue overflow [ 132.193968] audit: kauditd hold queue overflow [ 132.197546] audit: kauditd hold queue overflow [ 156.211713] audit: kauditd hold queue overflow [ 156.215388] audit: kauditd hold queue overflow [ 228.558571] audit: kauditd hold queue overflow [ 228.562120] audit: kauditd hold queue overflow [ 228.565629] audit: kauditd hold queue overflow [ 316.405562] audit: kauditd hold queue overflow [ 316.409136] audit: kauditd hold queue overflow
03-10-2022 10:10 PM
Thanks, I'll contact official databricks support to look at it more in depth 🙂
03-10-2022 05:04 AM
It is Linux kernel error (some event overwrite queue limit) so probably as you said it is related to some security settings like your image don't have access to some disks but there can be thousands other reasons as well.
03-10-2022 10:10 PM
Thanks, I'll contact official databricks support to look at it more in depth 🙂
04-25-2022 02:01 PM
Hi @michael henzl ,
Just a friendly follow-up. Did you were able to reach out to databricks support to get help on this issue? Let us know if you have any follow-up questions.
10-07-2022 06:37 AM
@michael henzl , did you manage to solve the problem?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group