cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pool Max Capacity vs Cluster Max Workers

EDDatabricks
Contributor

Hi all,

we have a databricks instance on Azure with a Compute Cluster version 7.3 LTS.

Currently the cluster has 4 max workers (min workers: 1) of type: Standard_D13_v2 and 1 driver of the same type. There are several jobs that are running on this cluster.

We are thinking to use Instance Pools to improve the initial spinup time of the cluster. If we set the Pool to max capacity 2 I understand that if the cluster tries to increase to 4 max workers it will fail and never increase.

See: https://learn.microsoft.com/en-gb/azure/databricks/clusters/instance-pools/configure.

Maximum Capacity
 
The maximum number of instances that the pool will provision. If set, this value constrains all instances (idle + used). If a cluster using the pool requests more instances than this number during autoscaling, the request will fail with an INSTANCE_POOL_MAX_CAPACITY_FAILURE error.

Another colleague tells me that the cluster will get the increase to 4 worker nodes, just at a slower start time (essentially acting as the pool did not exist but only for those workers). Can you please advise?

Furthermore to the above example, to accommodate the current setup of 4 max workers and 1 driver, we would actually need to set the max capacity to 5 to include the 4 workers + 1 driver. Can you please confirm my understanding or am I missing something?

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Databricks Employee
Databricks Employee

Hi @EDDatabricks EDDatabricks​ , If you set the maximum capacity of pool to 5, then the pool will not be able to autoscale to more than 5 instances. Also, the max pool capacity is total number of instances including the driver and the worker. So, 5 instances will mean 4 workers and 1 driver.

However having said that, you don't necessarily need to configure the maximum capacity of a pool. This will allow pool to pull as many instances as needed. The purpose of maximum capacity is to make sure a pool does not create instances beyond a certain ensuring the pool does not cost more than expected.

View solution in original post

2 REPLIES 2

Lakshay
Databricks Employee
Databricks Employee

Hi @EDDatabricks EDDatabricks​ , If you set the maximum capacity of pool to 5, then the pool will not be able to autoscale to more than 5 instances. Also, the max pool capacity is total number of instances including the driver and the worker. So, 5 instances will mean 4 workers and 1 driver.

However having said that, you don't necessarily need to configure the maximum capacity of a pool. This will allow pool to pull as many instances as needed. The purpose of maximum capacity is to make sure a pool does not create instances beyond a certain ensuring the pool does not cost more than expected.

Anonymous
Not applicable

Hi @EDDatabricks EDDatabricks​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group