<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks notebook sometime takes too long to run query (even on empty table) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19365#M12968</link>
    <description>&lt;P&gt;Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.&lt;/P&gt;</description>
    <pubDate>Fri, 02 Dec 2022 18:03:48 GMT</pubDate>
    <dc:creator>j_afanador</dc:creator>
    <dc:date>2022-12-02T18:03:48Z</dc:date>
    <item>
      <title>Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19360#M12963</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS • 5 workers) and it has constantly several workers.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;An Example of query is simple select on table which I truncated before, so I know it is empty, and I do something like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;# 
df = spark.sql(
  f"""
select count(*) from table_name
  """
)
&amp;nbsp;
display(df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;First time it took 1.3 minutes and running it again took 0.6 sec.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It seems to happen quite often, as if waiting for something to start even though it should be started and running.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you have some explanation for this behavior and how I can help it?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 13:04:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19360#M12963</guid>
      <dc:creator>Retko</dc:creator>
      <dc:date>2022-12-01T13:04:08Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19361#M12964</link>
      <description>&lt;P&gt;Hi @Retko Okter​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Two things might answer your question. &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;When you are calling an action for the first time, the table gets delta cached in memory and copies of files will be stored on local node's storage because of which you will be able to run queries much faster. &lt;/LI&gt;&lt;LI&gt;I might sound silly, but if you enabled autoscaling for the cluster, do check out in the event log of spark ui if the cluster is Upscaling or Downscaling. When cluster is in the process of acquiring/removing new nodes, your query obviously gets delayed. &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps. &lt;/P&gt;&lt;P&gt;Cheers..&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 13:11:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19361#M12964</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-01T13:11:22Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19362#M12965</link>
      <description>&lt;P&gt;are you sure you are the only person using the cluster?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 13:11:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19362#M12965</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-12-01T13:11:57Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19363#M12966</link>
      <description>&lt;P&gt;I agree with the @Retko Okter&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To support the second point, find the below explanation,​&lt;/P&gt;&lt;P&gt;&lt;B&gt;Optimized autoscaling&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Firstly, scales up from min to max in 2 steps.&lt;/LI&gt;&lt;LI&gt;Secondly, can scale down even if the cluster is not idle by looking at shuffle file state.&lt;/LI&gt;&lt;LI&gt;Thirdly, scales down based on a percentage of current nodes.&lt;/LI&gt;&lt;LI&gt;And, on job clusters, scales down if the cluster is underutilized over the last 40 seconds.&lt;/LI&gt;&lt;LI&gt;Lastly, on all-purpose clusters, scales down if the cluster is underutilized over the last 150 seconds.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;B&gt;Standard autoscaling&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Firstly, starts with adding 8 nodes. Thereafter, scales up exponentially, but can take many steps to reach the max.&lt;/LI&gt;&lt;LI&gt;Secondly, scales down only when the cluster is completely idle and it has been underutilized for the last 10 minutes.&lt;/LI&gt;&lt;LI&gt;Lastly, scales down exponentially, starting with 1 node.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 13:33:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19363#M12966</guid>
      <dc:creator>Harun</dc:creator>
      <dc:date>2022-12-01T13:33:40Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19364#M12967</link>
      <description>&lt;P&gt;Hey @Retko Okter​&amp;nbsp;, If its a all-purpose cluster and multiple users are using it, then the workload maybe high and results take time.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 04:50:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19364#M12967</guid>
      <dc:creator>Geeta1</dc:creator>
      <dc:date>2022-12-02T04:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook sometime takes too long to run query (even on empty table)</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19365#M12968</link>
      <description>&lt;P&gt;Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 18:03:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-sometime-takes-too-long-to-run-query-even-on/m-p/19365#M12968</guid>
      <dc:creator>j_afanador</dc:creator>
      <dc:date>2022-12-02T18:03:48Z</dc:date>
    </item>
  </channel>
</rss>

