Databricks Community

Serhii · ‎08-18-2022

I am running hourly job on a cluster using p3.2xlarge GPU instance, but sometimes cluster couldn't start due to instance unavailability. I wander is there is any fallback mechanism to, for example, try a different instance type if one is not available. Thanks

User16873043099 · ‎08-18-2022

Hello,

Instance type can never be changed to a different one if the defined type is unavailable in the AWS AZ.

Have you setup auto-AZ for this job? It will let databricks try a different availability zone within the same region if the instance_type is unavailable in one AZ.

Reference: https://docs.databricks.com/clusters/configure.html#automatic-availability-zones-auto-az

Anonymous · ‎08-18-2022

Did you manage to solve your problem because I have the same problem. SurgeCardInfo Login

abagshaw · ‎06-27-2023

(AWS only) For anyone experiencing capacity related cluster launch failures on non-GPU instance types, AWS Fleet instance types are now GA and available for clusters and instance pools. They help improve chance of successful cluster launch by allowing your cluster to use a mix of similar instance types. You can see more details here: https://docs.databricks.com/compute/aws-fleet-instances.html

Unfortunately fleet instance types don't support GPUs.