CLOUD_PROVIDER_LAUNCH_FAILURE (CLOUD_FAILURE) for workflow job with all-purpose cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-26-2024 03:21 AM
One of our databricks workflow job is failing occasionally with below error, after re-running and working fine without any issue.
What is the exact reason for the issue and how can we fix it
Error:
Unexpected failure while waiting for the cluster to be ready: Cluster 'XXX' was terminated. Reason: CLOUD_PROVIDER_LAUNCH_FAILURE (CLOUD_FAILURE). Parameters: azure_error_code:StorageFailure, azure_error_message:Error while creating storage object'
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-26-2024 03:51 AM
What cloud are you on? Azure? The error from my experience indicate either one of the following:
- There was a temporary cloud outtage/downtime
- Bust most likely: you've reached your cloud subscriptions VM/CPU quota. Most likely you need to increase the subscription quota for the VM type you're using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-26-2024 04:46 AM
Its Azure, we have the VM/CPU quota available, i don't think that is causing the issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-26-2024 04:54 AM
The problem with this error is that its very hard to debug.
I would create an Azure support ticket so they can look into the actual cause.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2024 02:06 AM
These are cloud provider related errors and we will not have much error details from the error message. Based on the error message and also, that you have enough CPU/VM quota available, I think the issue might be due to the storage creation stage in the cluster startup.
If the workspace is a VNET injected workspace, you can try adding Microsoft.Storage service endpoint for both the public and private subnets used for the workspace and check if you are still seeing the same behaviour.

