โ12-01-2022 05:04 AM
Hi,
sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS โข 5 workers) and it has constantly several workers.
An Example of query is simple select on table which I truncated before, so I know it is empty, and I do something like:
#
df = spark.sql(
f"""
select count(*) from table_name
"""
)
display(df)
First time it took 1.3 minutes and running it again took 0.6 sec.
It seems to happen quite often, as if waiting for something to start even though it should be started and running.
Do you have some explanation for this behavior and how I can help it?
Thank you!
โ12-01-2022 05:11 AM
Hi @Retko Okterโ
Two things might answer your question.
Hope this helps.
Cheers..
โ12-01-2022 05:33 AM
I agree with the @Retko Okter
To support the second point, find the below explanation,โ
Optimized autoscaling
Standard autoscaling
โ12-01-2022 05:11 AM
are you sure you are the only person using the cluster?
โ12-01-2022 08:50 PM
Hey @Retko Okterโ , If its a all-purpose cluster and multiple users are using it, then the workload maybe high and results take time.
โ12-02-2022 10:03 AM
Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group