cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL query takes too long to run

Anonymous
Not applicable

Hi all,

I have a set up SQL query run with 5 hours but the SQL endpoint take too long to start up with each run. Currently I don't know how to fix this ๐Ÿ˜ž

Could you please help me how to improve this?

1 ACCEPTED SOLUTION

Accepted Solutions

Unforgiven
Valued Contributor III
4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Jensen Acklesโ€‹ , Could you please do a tcpdump to the endpoint and check the hops. Also, checking network logs may help too. Also, is the query heavy? Was it working fine before (on-time)?

Anonymous
Not applicable

Actually, sometime it works ok, sometime it takes too long. BTW, I will get a tcpdump to check.

Tks

Unforgiven
Valued Contributor III

It's possible the connectivity to hive metastore is causing the delay here. When there is a high degree of concurrency and contention for metastore access. Interactive clusters in DBR are configured to use up to 5 (spark.databricks.hive.metastore.client.pool.size) hive clients. So if there are more than 5 concurrently running queries that are accessing the hive for a longer time, then there could be slowness.

The easy solution to try is to increase "spark.databricks.hive.metastore.client.pool.size" . Try increasing to 32 and see if there is an improvement.

Unforgiven
Valued Contributor III

or refer on this : Faster SQL Queries on Delta Lake with Dynamic File Pruning

https://www.databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-lake-with-dynamic-file-prunin...