cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Failure during cluster launch

arkadiuszr
New Contributor III

Hi all,

I am migrating to Databricks E2 from older one. I moved the cluster definitions from the old databricks instance as well as creating new ones. Databricks tries to start a cluster for an hour and then fails. This happens for modes: Single Node and Standard.

I have checked out this topic as well https://community.databricks.com/s/question/0D53f00001in5HDCAY/databricks-cluster-create-fail

but without any luck.

I don't see any AWS quotas to be reached.

Cluster terminated.Reason:Unexpected launch failure

An unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the problem persists.

Internal error message: java.lang.RuntimeException: Internal error (no failure to report) at com.databricks.backend.manager.AddResourcesStateHelper$.<init>(AddResourcesState.scala:216) at com.databricks.backend.manager.AddResourcesStateHelper$.<clinit>(AddResourcesState.scala) at com.databricks.backend.manager.ClusterManager.shouldStopAddingNodes(ClusterManager.scala:3979) at com.databricks.backend.manager.ClusterManager.runAddResourceSteps(ClusterManager.scala:4125) at com.databricks.backend.manager.ClusterManager.addResourcesToCluster(ClusterManager.scala:4033) at com.databricks.backend.manager.ClusterManager.$anonfun$doAddContainersToCluster$1(ClusterManager.scala:2158) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:460) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:480) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212) at com.databricks.backend.manager.ClusterManager.withAttributionContext(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261) at com.databricks.backend.manager.ClusterManager.withAttributionTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:455) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:375) at com.databricks.backend.manager.ClusterManager.recordOperationWithResultTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338) at com.databricks.backend.manager.ClusterManager.recordOperation(ClusterManager.scala:147) at com.databricks.backend.manager.ClusterManager.doAddContainersToCluster(ClusterManager.scala:2158) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$3(ClusterManager.scala:542) at com.databricks.backend.manager.ClusterManager.withAuditLog(ClusterManager.scala:2578) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$2(ClusterManager.scala:496) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:460) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:480) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212) at com.databricks.backend.manager.ClusterManager.withAttributionContext(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261) at com.databricks.backend.manager.ClusterManager.withAttributionTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:455) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:375) at com.databricks.backend.manager.ClusterManager.recordOperationWithResultTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338) at com.databricks.backend.manager.ClusterManager.recordOperation(ClusterManager.scala:147) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$1(ClusterManager.scala:496) at com.databricks.backend.manager.ClusterManager.catchInternalErrors(ClusterManager.scala:2605) at com.databricks.backend.manager.ClusterManager.doSetupCluster(ClusterManager.scala:478) at com.databricks.backend.manager.ClusterManager.doSetupOrUpsize(ClusterManager.scala:2771) at com.databricks.backend.manager.UpsizeThrottlingMonitor.$anonfun$processRequest$3(UpsizeThrottlingMonitor.scala:363) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.$anonfun$withConsolidatedClusterUpdateForAsync$1(ConsolidatedClusterUpdateHelper.scala:142) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdate(ConsolidatedClusterUpdateHelper.scala:61) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdateForAsync(ConsolidatedClusterUpdateHelper.scala:141) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdateForAsync$(ConsolidatedClusterUpdateHelper.scala:135) at com.databricks.backend.manager.UpsizeThrottlingMonitor.withConsolidatedClusterUpdateForAsync(UpsizeThrottlingMonitor.scala:76) at com.databricks.backend.manager.UpsizeThrottlingMonitor.$anonfun$processRequest$2(UpsizeThrottlingMonitor.scala:363) at

Thank you for your support

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Please check:

  • CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,
  • Check the network configuration. Maybe it is downloading something from the internet, and the network is blocked/slow, especially third-part libraries can cause the problem.
  • Start a new cluster with default databricks config. When it works, please add libraries step by step.
  • Please check driver logs - there can be more details.

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

Please check:

  • CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,
  • Check the network configuration. Maybe it is downloading something from the internet, and the network is blocked/slow, especially third-part libraries can cause the problem.
  • Start a new cluster with default databricks config. When it works, please add libraries step by step.
  • Please check driver logs - there can be more details.

Thanks for your reply.

Indeed, I've been facing networking issue - your hint was very helpful!

Hubert-Dudek
Esteemed Contributor III

Good to hear that it helped. If you can, please select my answer as the best one.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group