workflow cluster was create error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-24-2023 09:10 PM
I set the workflow to run at 12:00 every day in the workflow, but the workflow failed with the error message below, and I don't know why.
Run result unavailable: run failed with error message
Unexpected failure while waiting for the cluster (0506-023332-glkykcrs) to be ready: Cluster 0506-023332-glkykcrs is in unexpected state Terminated: UNEXPECTED_LAUNCH_FAILURE(SERVICE_FAULT): databricks_error_message:com.google.common.util.concurrent.UncheckedExecutionException: com.databricks.rpc.ReliableJettyClient$MaxRetriesExceededException: Max retries exhausted with RPC com.databricks.api.proto.central.GetCustomerStorageInfo, max retry count: 3
- Labels:
-
Cluster
-
Error Message
-
Rpc
-
Workflow Cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2023 12:02 AM
@Sangwoo Lee Hi, This seems to be an infra related issue. Please check the event logs to understand if the clusters were available to be provisioned from your Cloud provider. If this problem keep persisting, you can also try to choose an instance that is more commonly available in your Cloud region. We faced these kind of issues in All Purpose cluster spinning while requesting for high compute machines in AWS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2023 02:08 AM
Hello @Sangwoo Lee ,
As mentioned by vignesh, it seems like an infra related issue.
> Does the user (which executes the job) has access to start a cluster?
> Incase if it is not an access issue and Incase if you are starting a lot of workflow jobs together at the same time, try scheduling one job 5 minutes earlier ( just to start the cluster) - and schedule the remaining jobs to start together after 5 minutes. Idea is to have the cluster available already when the majority of the jobs need it.