cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

cluster nodes unavailable scenarios

Nino
Contributor
Concerning job cluster configuration, I'm trying to figure out what happens if AWS node type availability is smaller than the minimum number of workers specified in the configuration json (either availabilty<num_workers or, for autoscaling, availabilty<min_workers). 
 
Seeking insights into both scenarios:
  1. Low availability at cluster start
  2. Availability drop while computation is already in progress

Will the cluster start/continue computation? Wait? Fail?
Are there configurations to tweak related cluster behavior?
 
thanks!
1 REPLY 1

Nino
Contributor

thanks, @Retired_mod , useful info!

My specific scenario is running a notebook task with Job Clusters, and I've noticed that I get the best overall notebook run time by going without Autoscaling, setting the cluster configuration with a fixed `num_workers` (specifically, a single notebook where heavy ETL operation is followed by lightweight cmd cell, then something heavy again - cluster autoscales up & down a lot).

So, by your explanation, the num_workers approach puts me at risk in the case of low instance availability. This can be mitigated by Autoscaling, which in turn leads to increased run time. 

Is there a way to configure the Job Cluster so that it "aspires" for an ideal size, but doesn't fail if this ideal isn't reached?

This will be similar to Autoscaling, only that the cluster will not downsize voluntarily (will downsize only if lowered availability forces it to - and even then won't immediately fail). So if configured to "aspire" for 100 nodes, it'll wait x minutes and then start if anything higher than 50 nodes are available. Say 30 minutes later availability grows - it'll upscale, "aspiring" for those 100... 

Can something like this be achived?

Thanks!   

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group