Unavailable GPU compute

oye
New Contributor II

Hello,

I would like to create a ML compute with GPU. I am on GCP europe-west1 and the only available options for me are the G2 family and one instance of the A3 family (a3-highgpu-8g [H100]). I have been trying multiple times at different times but I keep getting an error:

Gcp Insufficient Capacity:
The VM launch operation failed due to resource exhaustion. [details] VM_MIN_COUNT_NOT_REACHED|ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS: Requested minimum count of 1 VMs could not be created.|The zone 'projects/com-melexis-prod-asgard/zones/europe-west1-c' does not have enough resources available to fulfill the request. '(resource type:compute)'.(OnDemand)

(screenshot also attached to this post)

After some research, this seems to indicate a temporary(?) unavailability of such GPU in my zone. I changed zone multiple times but to no avail. Can someone confirm that this is indeed the problem? Am I missing something? Is there a way to reliably create GPU compute in my region or do I need a workspace in a "better" region?

SP_6721
Honored Contributor II

Hi @oye ,

You’re hitting a cloud capacity issue, not a Databricks configuration problem. The Databricks GCP GPU docs list A2 and G2 as the supported GPU instance families. A3/H100 is not in the supported list: https://docs.databricks.com/gcp/en/compute/gpu?language=G2

Also check this out: Availability Zone and HA

oye
New Contributor II

Indeed I do not see A3 listed in the Databricks documentation but nevertheless I see it as an option. I never actually tried to start the cluster with an A3 machine since it looks very expensive.

oye_0-1766063558266.png

I have also tried HA and auto zone in the advanced setting, but also to no avail. I guess this is just due to limited resource in europe-west1 then.

 

SP_6721
Honored Contributor II

You’re correct, this is most likely due to temporary GPU capacity constraints in europe-west1, so the best workaround is to try other zones within the region or use a nearby region where GPU capacity is more readily available.