thanks, @Retired_mod , useful info!
My specific scenario is running a notebook task with Job Clusters, and I've noticed that I get the best overall notebook run time by going without Autoscaling, setting the cluster configuration with a fixed `num_workers` (specifically, a single notebook where heavy ETL operation is followed by lightweight cmd cell, then something heavy again - cluster autoscales up & down a lot).
So, by your explanation, the num_workers approach puts me at risk in the case of low instance availability. This can be mitigated by Autoscaling, which in turn leads to increased run time.
Is there a way to configure the Job Cluster so that it "aspires" for an ideal size, but doesn't fail if this ideal isn't reached?
This will be similar to Autoscaling, only that the cluster will not downsize voluntarily (will downsize only if lowered availability forces it to - and even then won't immediately fail). So if configured to "aspire" for 100 nodes, it'll wait x minutes and then start if anything higher than 50 nodes are available. Say 30 minutes later availability grows - it'll upscale, "aspiring" for those 100...
Can something like this be achived?
Thanks!