cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

(Big) Problem with SQL Warehouse Auto stop

leo-machado
New Contributor III

Long story short, I'm not sure if this is an already known problem, but the Auto Stop feature on SQL Warehouses after minutes of inactivity is not working properly.

We started using SQL Warehouses more aggressively this December when we scaled up one of our operations (this process uses a GoLang connector) and noticed it was getting pretty expensive.

After some investigation, we found this behavior:

image (2).png

Even though queries were finishing running and there was no activity in the cluster, the cluster down did not go down, as if it was still running.

To make sure it was not a problem with the visualization, I've checked warehouse events on the system.compute.warehouse_events table, and it was, in fact, not turning the warehouse off.

In one particular example, this is the log for the warehouse:

STARTING 0 2024-12-25T03:02:46.198+00:00
STOPPING 1 2024-12-25T04:07:09.253+00:00

And these were the only queries executed during that period (start time and end time):

2024-12-25T03:15:44.147+00:00 2024-12-25T03:15:49.266+00:00
2024-12-25T03:56:27.947+00:00 2024-12-25T03:56:28.063+00:00

This warehouse is configured to auto-stop after 1 minute of inactivity, which did NOT happen. In order to test the cost impact of this, I've designed a quick code using Serverless and AWS Lambda to force warehouse stopping when there are not queries running/queued.

These are the results in terms of warehouse costs:

Screenshot 2025-01-02 at 10.31.27.png

Basically a drop from $40/day to $10/day less.

I'm not sure what is causing the warehouse to stay "on", but I'm guessing it is interpreting JDBC connections as "active sessions" even though they're not running any queries - anyway, this is a serious problem in my opinion.

For those of you that want to copy the approach, the code can be found here: https://github.com/lmachado-sousa/databricks-sql-warehouse-terminator

9 REPLIES 9

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @leo-machado,

This problem likely happen because of active sessions on the JDBC connection to the warehouse as you pointed out.

This kind of issues required more investigations, could you please share your warehouseID and your workspaceID on a DIM message I can check our backend logs to understand the behavior. 

leo-machado
New Contributor III

@Alberto_Umana, I've sent you the IDs as requested in a direct message

To point out, we did investigate the active connection hypothesis and made sure our code was always closing the connection - our jobs take 10 min at most to finish as well - the problem continued even after making this check.

Alberto_Umana
Databricks Employee
Databricks Employee

Thanks Leo, I received the details, I will review it once I have some time and get back! 

So, currently behavior it's not happening because you implemented the custom logic, right? 

leo-machado
New Contributor III

Correct, the lambda terminator solved the problem in the short term

pdiamond
Contributor

Is this still being investigated by Databricks? I'm seeing similar behavior that's costing us a lot of money.

leo-machado
New Contributor III

Hi! @pdiamond as of today, no one has returned on this topic and I'm still using the lambda terminator to make sure the clusters are terminated when no query is running - the issue is still present.

Advika
Databricks Employee
Databricks Employee

Hello @leo-machado and @pdiamond!

Thanks for sharing the details and your workaround.
I understand the seriousness of the issue and will reach out to the relevant team internally. Iโ€™ll follow up on this thread as soon as I have an update. Appreciate your patience in the meantime!

pdiamond
Contributor

One thing we noticed is that the "Monitoring" tab of the SQL Warehouse page can be misleading because it doesn't always account for time spent fetching & serving the data to the client, only the actual query execution time. This was confirmed by Support and they also said enhancments to the Monitoring page were being developed to address it.

leo-machado
New Contributor III

Makes sense! The go connector paginates the results and it takes a while longer to fetch everything - we did stop using pagination and started using the INSERT OVERWRITE DIRECTORY statement and got better cost efficient results.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now