Cluster Upsize Issue: Storage Download Failure Slow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2024 12:18 PM
Hi,
We're currently experiencing the following issue across our entire Databricks Workspace when either starting a cluster, running a workflow, or upscaling a running cluster. The following errors we receive on our AP clusters and job clusters are below:
Compute upsize complete, but below target size. The current worker count is 6, out of a target of 8. Reason: Storage Download Failure Slow
Cluster '0925-190009-qlelyoz' was terminated. Reason: STORAGE_DOWNLOAD_FAILURE_SLOW (CLIENT_ERROR). Parameters: databricks_error_message:Downloading worker artifacts onto the instance timed out.
This results in workflows failing and AP clusters not being able to gather additional resources. I haven't seen any similar issues across the community and was wondering how we can go about troubleshooting this issue.
Thank you,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2024 01:08 PM
Hi @sdick_vg ,
The error is about connectivity issues when trying to reach Azure Storage.
Have you maybe enabled any kind of firewall in your organization recently?
Could you run for example code to test DNS resolution to your storage account:
Have you made any changes to the vnet where the databricks storage account is located?

