Databricks pools are a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the poolโs idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the instance provider in order to accommodate the clusterโs request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Databricks does not charge DBUs while instances are idle in the pool, resulting in cost savings. However, cloud provider infrastructure costs do apply.
For the Min Idle setting, it's recommended to set the Min Idle instances to 0 to avoid paying for running instances that arenโt doing work. However, this could result in a possible increase in time when a cluster needs to acquire a new instance. If you're only running interactive workloads during business hours, make sure the pool's "Min Idle" instance count is set to zero after hours. Or if your automated data pipeline runs for a few hours at night, set the "Min Idle" count a few minutes before the pipeline starts and then revert it to zero afterwards.
As for the best practice of using pools, it depends on your specific use case. If your driver node and worker nodes have different requirements, create a different pool for each. You can minimize instance acquisition time by creating a pool for each instance type and Databricks runtime your organization commonly uses. For example, if most data engineering clusters use instance type A, data science clusters use instance type B, and analytics clusters use instance type C, create a pool with each instance type. Also, consider using spot instances to reduce costs and on-demand instances for jobs with short execution times and strict execution time requirements.
https://www.databricks.com/blog/2019/11/11/databricks-pools-speed-up-data-pipelines.html