โ11-17-2022 07:47 PM
Hi all,
I have a set up SQL query run with 5 hours but the SQL endpoint take too long to start up with each run. Currently I don't know how to fix this ๐
Could you please help me how to improve this?
โ11-21-2022 08:54 PM
or refer on this : Faster SQL Queries on Delta Lake with Dynamic File Pruning
โ11-17-2022 10:57 PM
Hi @Jensen Acklesโ , Could you please do a tcpdump to the endpoint and check the hops. Also, checking network logs may help too. Also, is the query heavy? Was it working fine before (on-time)?
โ11-17-2022 11:40 PM
Actually, sometime it works ok, sometime it takes too long. BTW, I will get a tcpdump to check.
Tks
โ11-21-2022 08:52 PM
It's possible the connectivity to hive metastore is causing the delay here. When there is a high degree of concurrency and contention for metastore access. Interactive clusters in DBR are configured to use up to 5 (spark.databricks.hive.metastore.client.pool.size) hive clients. So if there are more than 5 concurrently running queries that are accessing the hive for a longer time, then there could be slowness.
The easy solution to try is to increase "spark.databricks.hive.metastore.client.pool.size" . Try increasing to 32 and see if there is an improvement.
โ11-21-2022 08:54 PM
or refer on this : Faster SQL Queries on Delta Lake with Dynamic File Pruning
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now