Hello Karthik, many thanks for your question. Databricks SQL Warehouses use dynamic concurrency to handle varying demands. Unlike static-capacity warehouses, Databricks SQL adjusts compute resources in real time to manage concurrent loads and maximize throughput. Each warehouse size category has a fixed compute capacity per unit, but the system scales the number of resources to accommodate varying demands.
For any warehouse type, you choose a cluster size for its compute resources. Databricks recommends a cluster for every 10 concurrent queries. The maximum number of queries in a queue for all SQL warehouse types is 1000. The upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size.
Here is a summary of the scaling behavior:
- If the expected query load is less than 2 minutes, Databricks does not upscale.
- If the expected query load is between 2 to 6 minutes, Databricks adds 1 cluster.
- If the expected query load is between 6 to 12 minutes, Databricks adds 2 clusters.
- If the expected query load is between 12 to 22 minutes, Databricks adds 3 clusters.
- For every additional 15 minutes of expected query load beyond 22 minutes, Databricks adds 1 more cluster.
Additionally, a warehouse is always upscaled if a query waits for 5 minutes in the queue. If the load is low for 15 minutes, Databricks downscales the SQL warehouse, keeping enough clusters to handle the peak load over the last 15 minutes.
For serverless SQL warehouses, intelligent workload management (IWM) dynamically manages workloads by using machine learning models to predict the resource demands of incoming queries while monitoring the warehouseโs available compute capacity in real time. This allows for rapid upscaling to maintain low latency and quick downscaling to minimize costs when demand is low.
You can refer to: https://docs.databricks.com/en/compute/sql-warehouse/warehouse-behavior.html