Databricks Community

Anonymous · ‎11-17-2022

Hi all,

I have a set up SQL query run with 5 hours but the SQL endpoint take too long to start up with each run. Currently I don't know how to fix this 😞

Could you please help me how to improve this?

Unforgiven · ‎11-21-2022

or refer on this : Faster SQL Queries on Delta Lake with Dynamic File Pruning

https://www.databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-lake-with-dynamic-file-prunin...

View solution in original post

Debayan · ‎11-17-2022

Hi @Jensen Ackles , Could you please do a tcpdump to the endpoint and check the hops. Also, checking network logs may help too. Also, is the query heavy? Was it working fine before (on-time)?

Anonymous · ‎11-17-2022

Actually, sometime it works ok, sometime it takes too long. BTW, I will get a tcpdump to check.

Tks

Unforgiven · ‎11-21-2022

It's possible the connectivity to hive metastore is causing the delay here. When there is a high degree of concurrency and contention for metastore access. Interactive clusters in DBR are configured to use up to 5 (spark.databricks.hive.metastore.client.pool.size) hive clients. So if there are more than 5 concurrently running queries that are accessing the hive for a longer time, then there could be slowness.

The easy solution to try is to increase "spark.databricks.hive.metastore.client.pool.size" . Try increasing to 32 and see if there is an improvement.