cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Failure during cluster launch

arkadiuszr
New Contributor III

Hi all,

I am migrating to Databricks E2 from older one. I moved the cluster definitions from the old databricks instance as well as creating new ones. Databricks tries to start a cluster for an hour and then fails. This happens for modes: Single Node and Standard.

I have checked out this topic as well https://community.databricks.com/s/question/0D53f00001in5HDCAY/databricks-cluster-create-fail

but without any luck.

I don't see any AWS quotas to be reached.

Cluster terminated.Reason:Unexpected launch failure

An unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the problem persists.

Internal error message: java.lang.RuntimeException: Internal error (no failure to report) at com.databricks.backend.manager.AddResourcesStateHelper$.<init>(AddResourcesState.scala:216) at com.databricks.backend.manager.AddResourcesStateHelper$.<clinit>(AddResourcesState.scala) at com.databricks.backend.manager.ClusterManager.shouldStopAddingNodes(ClusterManager.scala:3979) at com.databricks.backend.manager.ClusterManager.runAddResourceSteps(ClusterManager.scala:4125) at com.databricks.backend.manager.ClusterManager.addResourcesToCluster(ClusterManager.scala:4033) at com.databricks.backend.manager.ClusterManager.$anonfun$doAddContainersToCluster$1(ClusterManager.scala:2158) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:460) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:480) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212) at com.databricks.backend.manager.ClusterManager.withAttributionContext(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261) at com.databricks.backend.manager.ClusterManager.withAttributionTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:455) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:375) at com.databricks.backend.manager.ClusterManager.recordOperationWithResultTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338) at com.databricks.backend.manager.ClusterManager.recordOperation(ClusterManager.scala:147) at com.databricks.backend.manager.ClusterManager.doAddContainersToCluster(ClusterManager.scala:2158) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$3(ClusterManager.scala:542) at com.databricks.backend.manager.ClusterManager.withAuditLog(ClusterManager.scala:2578) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$2(ClusterManager.scala:496) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:460) at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:480) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212) at com.databricks.backend.manager.ClusterManager.withAttributionContext(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261) at com.databricks.backend.manager.ClusterManager.withAttributionTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:455) at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:375) at com.databricks.backend.manager.ClusterManager.recordOperationWithResultTags(ClusterManager.scala:147) at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:366) at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338) at com.databricks.backend.manager.ClusterManager.recordOperation(ClusterManager.scala:147) at com.databricks.backend.manager.ClusterManager.$anonfun$doSetupCluster$1(ClusterManager.scala:496) at com.databricks.backend.manager.ClusterManager.catchInternalErrors(ClusterManager.scala:2605) at com.databricks.backend.manager.ClusterManager.doSetupCluster(ClusterManager.scala:478) at com.databricks.backend.manager.ClusterManager.doSetupOrUpsize(ClusterManager.scala:2771) at com.databricks.backend.manager.UpsizeThrottlingMonitor.$anonfun$processRequest$3(UpsizeThrottlingMonitor.scala:363) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.$anonfun$withConsolidatedClusterUpdateForAsync$1(ConsolidatedClusterUpdateHelper.scala:142) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdate(ConsolidatedClusterUpdateHelper.scala:61) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdateForAsync(ConsolidatedClusterUpdateHelper.scala:141) at com.databricks.backend.manager.util.ConsolidatedClusterUpdateHelper.withConsolidatedClusterUpdateForAsync$(ConsolidatedClusterUpdateHelper.scala:135) at com.databricks.backend.manager.UpsizeThrottlingMonitor.withConsolidatedClusterUpdateForAsync(UpsizeThrottlingMonitor.scala:76) at com.databricks.backend.manager.UpsizeThrottlingMonitor.$anonfun$processRequest$2(UpsizeThrottlingMonitor.scala:363) at

Thank you for your support

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Please check:

  • CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,
  • Check the network configuration. Maybe it is downloading something from the internet, and the network is blocked/slow, especially third-part libraries can cause the problem.
  • Start a new cluster with default databricks config. When it works, please add libraries step by step.
  • Please check driver logs - there can be more details.

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Please check:

  • CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,
  • Check the network configuration. Maybe it is downloading something from the internet, and the network is blocked/slow, especially third-part libraries can cause the problem.
  • Start a new cluster with default databricks config. When it works, please add libraries step by step.
  • Please check driver logs - there can be more details.

Thanks for your reply.

Indeed, I've been facing networking issue - your hint was very helpful!

Hubert-Dudek
Esteemed Contributor III

Good to hear that it helped. If you can, please select my answer as the best one.

Kaniz
Community Manager
Community Manager

Thank you @Hubert Dudek​  for your fantastic response.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.