What is the optimal number of cluster/nodes in a warehouse?
Depends on your workload. Our DBSQL guide suggests a size range on two main things: time to execute your query and bytes spilled from it. This link can help you understand better optimization techniques for DBSQL workloads and better scaling practices.
How does SQL warehouse pull in data prior to running queries on it?
Warehouses uses a scheduler to distribute queries for each cluster. Once a cluster is able to take it, it will send it for processing. Each cluster can take, at maximum, 10 queries sent. To better check if your queries are being scheduled and running, you can check the “Monitoring” tab of your SQL warehouse. Warehouses seems to "warm-up" because they set up necessary infrastructure for the runs, and also cache metadata from the tables. With this, they are able to speed up any other runs that they already mapped out the schema.
Is it 1 cluster node per query?
Not necessarily. This is managed internally by the scheduler which acts like a load balancer and it will search for the best cluster availability while talking to the cluster manager. Since a cluster can take up to 10 queries, executors can handle different ones depending on the load and plans generated.
Should I take data partitioning into consideration when composing SQL queries?
Yes. Once your query is submitted, it ends like any Spark operation that you regularly do with SQL on a Notebook, being broadcasted to workers so they can do the heavy lifting in a paralleled manner. Partitioning your tables, and making sure filters are applied on them, can speed up significantly the performance. DBSQL clusters also have Photon enabled, which optimizes even more your query plans.
Using the "Serverless" type of Warehouse can speed up the process since it boots way faster than "Classic" and "Pro" warehouses.