cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I dimension my DBSQL warehouse correctly?

anardinelli
Databricks Employee
Databricks Employee

What is the optimal number of cluster/nodes in a warehouse?

Depends on your workload. Our DBSQL guide suggests a size range on two main things: time to execute your query and bytes spilled from it. This link can help you understand better optimization techniques for DBSQL workloads and better scaling practices.

How does SQL warehouse pull in data prior to running queries on it?

Warehouses uses a scheduler to distribute queries for each cluster. Once a cluster is able to take it, it will send it for processing. Each cluster can take, at maximum, 10 queries sent. To better check if your queries are being scheduled and running, you can check the “Monitoring” tab of your SQL warehouse. Warehouses seems to "warm-up" because they set up necessary infrastructure for the runs, and also cache metadata from the tables. With this, they are able to speed up any other runs that they already mapped out the schema.

Is it 1 cluster node per query?

Not necessarily. This is managed internally by the scheduler which acts like a load balancer and it will search for the best cluster availability while talking to the cluster manager. Since a cluster can take up to 10 queries, executors can handle different ones depending on the load and plans generated.

Should I take data partitioning into consideration when composing SQL queries?

Yes. Once your query is submitted, it ends like any Spark operation that you regularly do with SQL on a Notebook, being broadcasted to workers so they can do the heavy lifting in a paralleled manner. Partitioning your tables, and making sure filters are applied on them, can speed up significantly the performance. DBSQL clusters also have Photon enabled, which optimizes even more your query plans. 

Using the "Serverless" type of Warehouse can speed up the process since it boots way faster than "Classic" and "Pro" warehouses.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group