Could not launch jobs due to node_type_id (instance) unavailability
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-18-2022 01:40 AM
I am running hourly job on a cluster using p3.2xlarge GPU instance, but sometimes cluster couldn't start due to instance unavailability. I wander is there is any fallback mechanism to, for example, try a different instance type if one is not available. Thanks
- Labels:
-
Cluster
-
Jobs & Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-18-2022 10:27 AM
Hello,
Instance type can never be changed to a different one if the defined type is unavailable in the AWS AZ.
Have you setup auto-AZ for this job? It will let databricks try a different availability zone within the same region if the instance_type is unavailable in one AZ.
Reference: https://docs.databricks.com/clusters/configure.html#automatic-availability-zones-auto-az

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-18-2022 11:13 PM
Did you manage to solve your problem because I have the same problem. SurgeCardInfo Login
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2023 11:57 AM
(AWS only) For anyone experiencing capacity related cluster launch failures on non-GPU instance types, AWS Fleet instance types are now GA and available for clusters and instance pools. They help improve chance of successful cluster launch by allowing your cluster to use a mix of similar instance types. You can see more details here: https://docs.databricks.com/compute/aws-fleet-instances.html
Unfortunately fleet instance types don't support GPUs.

