<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Understanding what impacts &amp;quot;Optimizing query &amp;amp; pruning files&amp;quot; time in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141276#M2413</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;,&amp;nbsp;thank you for the detailed response - really useful.&amp;nbsp;&lt;/P&gt;&lt;P&gt;We'll work through what looks most relevant to us.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Question 1: Can you confirm of the `compilation_time_ms` in the metrics from the &lt;A href="https://docs.databricks.com/api/workspace/queryhistory/list" target="_blank"&gt;list&lt;/A&gt; query endpoint corresponds to the&amp;nbsp;&lt;SPAN&gt;“Optimizing query &amp;amp; pruning files” time on the query history page.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Question 2: Is there a way to see when autoscaling is firing? Is that simply looking at the "Running cluster" on the warehouse monitoring page?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;PS: It would be great to see your explanation in the docs somewhere.&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 05 Dec 2025 16:03:33 GMT</pubDate>
    <dc:creator>Rennzie</dc:creator>
    <dc:date>2025-12-05T16:03:33Z</dc:date>
    <item>
      <title>Understanding what impacts "Optimizing query &amp; pruning files" time</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141192#M2408</link>
      <description>&lt;P&gt;Hi everyone.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm keen to understand what impacts the "&lt;SPAN class=""&gt;Optimizing query &amp;amp; pruning files" times seen in the Query Profile History for a sQL Warehouse query. We're querying against delta tables and cannot find any information about what or when a query will have a long complication and optimisation time. From observation it appears to be random and we'd like a way to mitigate it's effects.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;Any help is much appreciated.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 18:29:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141192#M2408</guid>
      <dc:creator>Rennzie</dc:creator>
      <dc:date>2025-12-04T18:29:30Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding what impacts "Optimizing query &amp; pruning files" time</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141195#M2409</link>
      <description>&lt;P class="p1"&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/199819"&gt;@Rennzie&lt;/a&gt;&amp;nbsp;, think of the “Optimizing query &amp;amp; pruning files” step as the warm-up routine before the warehouse starts lifting any real weights. In this window, the engine is lining up the play: skipping irrelevant files, compiling the plan, and conducting the necessary security checks before we ever touch the underlying data.&lt;/P&gt;
&lt;P class="p1"&gt;If you check the Query Profile, Databricks separates this phase cleanly from scheduling and execution, so when you see spikes here, you’re looking at pre-scan work — not the scan itself.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why the duration jumps around&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P class="p1"&gt;A few patterns consistently show up in the wild:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Cold vs. warm cache&lt;/P&gt;
&lt;P class="p1"&gt;After a warehouse wakes up from suspension, those first couple of queries stretch their legs. Metadata is cold, the cache is empty, and the warehouse needs a moment before it behaves like its usual self. After a few runs, everything tightens up.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Data skipping job scheduling&lt;/P&gt;
&lt;P class="p1"&gt;If the system decides it needs to run a PrepareDeltaScan job to trim the file list, the scheduling delay alone can dominate this phase — especially when the warehouse is already juggling other work. This is why the timing can feel random across identical runs.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Delta Lake metadata size and file fragmentation&lt;/P&gt;
&lt;P class="p1"&gt;Huge logs and a jackpot of tiny files slow down metadata reads and skew pruning times. Teams routinely see this improve immediately after a round of compaction and checkpointing.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Layout alignment&lt;/P&gt;
&lt;P class="p1"&gt;If your data isn’t physically laid out to support your predicates, pruning becomes less effective — and the optimization phase spends more time figuring out what it can safely ignore.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What you can do about it&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P class="p1"&gt;Here are the levers that consistently move the needle:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Warm the warehouse&lt;/P&gt;
&lt;P class="p1"&gt;Run a lightweight filter on your busiest tables after a restart. This primes caches, metadata, and remote result paths. Remember: the remote result cache sticks around for 24 hours, but only for unchanged tables and exact query matches.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Reduce file and metadata bloat&lt;/P&gt;
&lt;P class="p1"&gt;OPTIMIZE + VACUUM is still the gold standard. If you lean heavily on certain filters, Z-ORDER or Liquid Clustering will improve locality and make pruning much more decisive.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Manage concurrency&lt;/P&gt;
&lt;P class="p1"&gt;If the warehouse is swamped, the pre-scan phase gets stuck in traffic. Consider bumping warehouse size or adjusting autoscaling if your workload bursts unpredictably.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Align filters with layout&lt;/P&gt;
&lt;P class="p1"&gt;Target partition columns or clustering keys. If your WHERE clause isn’t aligned with how the data is arranged, the engine has no choice but to work harder up front.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Let the Query Profile guide you&lt;/P&gt;
&lt;P class="p1"&gt;If “PrepareDeltaScan” dominates the Top rules section, that’s metadata overhead waving its hands. Focus your fixes there.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;A quick diagnostic pass&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P class="p1"&gt;If you want to sanity-check your environment fast:&lt;/P&gt;
&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Compare a “slow” vs “fast” Query Profile. Look at files pruned, bytes read, and which rules are taking the most time.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Check whether the warehouse had just restarted — cold starts change everything.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Observe concurrency: autoscaling firing? Several queries landing at once?&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Inspect table health: file counts, partition spread, last OPTIMIZE/VACUUM run.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Validate predicate alignment with your partitioning or clustering strategy.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;Hope this sheds some light on the matter.&lt;/P&gt;
&lt;P class="p1"&gt;Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 20:38:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141195#M2409</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-12-04T20:38:22Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding what impacts "Optimizing query &amp; pruning files" time</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141276#M2413</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;,&amp;nbsp;thank you for the detailed response - really useful.&amp;nbsp;&lt;/P&gt;&lt;P&gt;We'll work through what looks most relevant to us.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Question 1: Can you confirm of the `compilation_time_ms` in the metrics from the &lt;A href="https://docs.databricks.com/api/workspace/queryhistory/list" target="_blank"&gt;list&lt;/A&gt; query endpoint corresponds to the&amp;nbsp;&lt;SPAN&gt;“Optimizing query &amp;amp; pruning files” time on the query history page.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Question 2: Is there a way to see when autoscaling is firing? Is that simply looking at the "Running cluster" on the warehouse monitoring page?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;PS: It would be great to see your explanation in the docs somewhere.&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Dec 2025 16:03:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/understanding-what-impacts-quot-optimizing-query-amp-pruning/m-p/141276#M2413</guid>
      <dc:creator>Rennzie</dc:creator>
      <dc:date>2025-12-05T16:03:33Z</dc:date>
    </item>
  </channel>
</rss>

