cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Pool Max Capacity vs Cluster Max Workers

EDDatabricks
Contributor

Hi all,

we have a databricks instance on Azure with a Compute Cluster version 7.3 LTS.

Currently the cluster has 4 max workers (min workers: 1) of type: Standard_D13_v2 and 1 driver of the same type. There are several jobs that are running on this cluster.

We are thinking to use Instance Pools to improve the initial spinup time of the cluster. If we set the Pool to max capacity 2 I understand that if the cluster tries to increase to 4 max workers it will fail and never increase.

See: https://learn.microsoft.com/en-gb/azure/databricks/clusters/instance-pools/configure.

Maximum Capacity
 
The maximum number of instances that the pool will provision. If set, this value constrains all instances (idle + used). If a cluster using the pool requests more instances than this number during autoscaling, the request will fail with an INSTANCE_POOL_MAX_CAPACITY_FAILURE error.

Another colleague tells me that the cluster will get the increase to 4 worker nodes, just at a slower start time (essentially acting as the pool did not exist but only for those workers). Can you please advise?

Furthermore to the above example, to accommodate the current setup of 4 max workers and 1 driver, we would actually need to set the max capacity to 5 to include the 4 workers + 1 driver. Can you please confirm my understanding or am I missing something?

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @EDDatabricks EDDatabricks​ , If you set the maximum capacity of pool to 5, then the pool will not be able to autoscale to more than 5 instances. Also, the max pool capacity is total number of instances including the driver and the worker. So, 5 instances will mean 4 workers and 1 driver.

However having said that, you don't necessarily need to configure the maximum capacity of a pool. This will allow pool to pull as many instances as needed. The purpose of maximum capacity is to make sure a pool does not create instances beyond a certain ensuring the pool does not cost more than expected.

View solution in original post

2 REPLIES 2

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @EDDatabricks EDDatabricks​ , If you set the maximum capacity of pool to 5, then the pool will not be able to autoscale to more than 5 instances. Also, the max pool capacity is total number of instances including the driver and the worker. So, 5 instances will mean 4 workers and 1 driver.

However having said that, you don't necessarily need to configure the maximum capacity of a pool. This will allow pool to pull as many instances as needed. The purpose of maximum capacity is to make sure a pool does not create instances beyond a certain ensuring the pool does not cost more than expected.

Anonymous
Not applicable

Hi @EDDatabricks EDDatabricks​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.