Databricks Community

jerry747847 · ‎10-11-2022

Hello experts,

For the below question, I am trying to understand why option C was selected instead of B? As B would also have resolved the issue

Question 40

A data analyst has noticed that their Databricks SQL queries are running too slowly. They claim that this issue is affecting all of their sequentially run queries. They ask the data engineering team for help. The data engineering team notices that each of the queries uses the same SQL endpoint, but the SQL endpoint is not used by any other user. Which of the following approaches can the data engineering team use to improve the latency of the data analyst’s queries?

A. They can turn on the Serverless feature for the SQL endpoint.

B. They can increase the maximum bound of the SQL endpoint’s scaling range.

C. They can increase the cluster size of the SQL endpoint.

D. They can turn on the Auto Stop feature for the SQL endpoint.

E. They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”

This is the link for the practice exam

Anonymous · ‎10-13-2022

The key is sequential. 2 has to wait for 1. The autoscaling is triggered by jobs sitting in the queue. 2 won't be in the queue yet because it's waiting for 1 to finish and then it gets added.

If all the queries were submitted at the same time, then more autoscaling would help.

View solution in original post

Anonymous · ‎10-12-2022

I think the key is the word sequential. If the queries are sequential, they won't trigger autoscaling. For sequential, you'd want a larger VM. Autoscaling will be triggered by lots of cluster use/many queries.

jerry747847 · ‎10-13-2022

@Joseph Kambourakis Thanks a lot for your response. Yes, I agree with you. By autoscaling, you meant adding new clusters or adding more worker nodes to the existing cluster? Can you please share any references you have on this?

Anonymous · ‎10-13-2022

https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling

Autoscaling is automatic cluster sizing up or down by adding or removing VMs to an existing cluster.

jerry747847 · ‎10-13-2022

@Joseph Kambourakis great thanks.👍 Just to make things clear, here Query 1 is fired first followed by Query2, Query3, etc. like that in sequential order. So if Query 1 runs on say 4 worker nodes of the same VM instead of 2 worker nodes (option B) and similarly for Query 2, Query 3, etc. won't that address the performance issue highlighted in the question

Anonymous · ‎10-13-2022

The key is sequential. 2 has to wait for 1. The autoscaling is triggered by jobs sitting in the queue. 2 won't be in the queue yet because it's waiting for 1 to finish and then it gets added.

If all the queries were submitted at the same time, then more autoscaling would help.

JRL · ‎02-07-2023

On a sql server, there are wait states. Wait states occur when several processors (vCPUs) are processing and several threads are working through the processors. A longer running thread that has dependencies, can cause the thread that may have begun on a different process, but is dependent on the one that has the longer running thread to complete, will cause a wait state and slow down the dependent query. From the sysadmin POV, this could also be seen as hard faults in memory. The issue above from a SQL perspective is to increase the maximum bound which add endpoints (threads) and logically create additional flows - more streams, faster flow. From the sysadmin POV, adding additional clusters to increase the size of the endpoint would also solve the problem because the amount of memory allows for swifter execution. In either case, it should resolve the issue. It would make more sense to allow Azure or AWS to sort this out with a Serverless feature that is Reliably Optimizied. However, that feature will simply perform the feat of scaling the cluster. Assuming AWS/Azure have a handle on it, and their methodology is to scale the cluster, the better answer..., based on their handling of the situation, is to scale upward with the cluster. I suspect the answer here is the Databricks people aren't always thinking like SQL Server database administrators. They have lots of cluster power. Frankly, if it were my issue, I would try the maximum bound increase first. If that didn't work then I'd go for the increased cluster size. But my thinking is a cost optimization. There is not a part of the question that asks how to do this in a cost effective way. So - final answer - C: boost the cluster.

Databricks Community

When to increase maximum bound vs when to increase cluster size?

DAIS 2026 Day 2 — Midweek, Full Speed

DAIS 2026 Day 1 Recap - The Booth, The Crew, The Energy

Solution Accelerator Series | Building a Chatbot With Large Language Models (LLMs)

Build apps without jumping through hoops

Data+AI Summit 2026 | Share Your Moment, Own Your Spotlight!