<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic spark.databricks.optimizer.replaceWindowsWithAggregates.enabled in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/spark-databricks-optimizer-replacewindowswithaggregates-enabled/m-p/103555#M1772</link>
    <description>&lt;P&gt;I have seen in the release notes of 15.3 that this was introduced and couldn't wrap my head around it.&lt;/P&gt;&lt;P&gt;Does someone has an example of a plan before and after?&lt;/P&gt;&lt;P&gt;Quote:&lt;/P&gt;&lt;DIV class=""&gt;&lt;H3&gt;Performance improvement for some window functions&lt;/H3&gt;&lt;P&gt;This release includes a change that improves the performance of some Spark window functions, specifically functions that do not include an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;ORDER&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;BY&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;clause or a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;window_frame&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;parameter. In these cases, the system can rewrite the query to run it using an aggregate function. This change allows the query to run faster by using partial aggregation and avoiding the overhead of running window functions. The Spark configuration parameter&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;controls this optimization and is set to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;by default. To turn this optimization off, set&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;false&lt;/SPAN&gt;.&lt;/P&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;</description>
    <pubDate>Mon, 30 Dec 2024 14:43:11 GMT</pubDate>
    <dc:creator>OfirM</dc:creator>
    <dc:date>2024-12-30T14:43:11Z</dc:date>
    <item>
      <title>spark.databricks.optimizer.replaceWindowsWithAggregates.enabled</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/spark-databricks-optimizer-replacewindowswithaggregates-enabled/m-p/103555#M1772</link>
      <description>&lt;P&gt;I have seen in the release notes of 15.3 that this was introduced and couldn't wrap my head around it.&lt;/P&gt;&lt;P&gt;Does someone has an example of a plan before and after?&lt;/P&gt;&lt;P&gt;Quote:&lt;/P&gt;&lt;DIV class=""&gt;&lt;H3&gt;Performance improvement for some window functions&lt;/H3&gt;&lt;P&gt;This release includes a change that improves the performance of some Spark window functions, specifically functions that do not include an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;ORDER&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;BY&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;clause or a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;window_frame&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;parameter. In these cases, the system can rewrite the query to run it using an aggregate function. This change allows the query to run faster by using partial aggregation and avoiding the overhead of running window functions. The Spark configuration parameter&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;controls this optimization and is set to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;by default. To turn this optimization off, set&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;false&lt;/SPAN&gt;.&lt;/P&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Mon, 30 Dec 2024 14:43:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/spark-databricks-optimizer-replacewindowswithaggregates-enabled/m-p/103555#M1772</guid>
      <dc:creator>OfirM</dc:creator>
      <dc:date>2024-12-30T14:43:11Z</dc:date>
    </item>
    <item>
      <title>Re: spark.databricks.optimizer.replaceWindowsWithAggregates.enabled</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/spark-databricks-optimizer-replacewindowswithaggregates-enabled/m-p/103558#M1773</link>
      <description>&lt;H3 class="_1jeaq5e0 _1t7bu9h9 heading3"&gt;Before Optimization:&lt;/H3&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Consider a query that calculates the sum of a column &lt;CODE&gt;value&lt;/CODE&gt; partitioned by &lt;CODE&gt;category&lt;/CODE&gt; without an &lt;CODE&gt;ORDER BY&lt;/CODE&gt; clause or a &lt;CODE&gt;window_frame&lt;/CODE&gt; parameter:&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="_1sijkvt3"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql _1t7bu9hb hljs language-sql gb5fhw3"&gt;&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; category, &lt;SPAN class="hljs-built_in"&gt;SUM&lt;/SPAN&gt;(&lt;SPAN class="hljs-keyword"&gt;value&lt;/SPAN&gt;) &lt;SPAN class="hljs-keyword"&gt;OVER&lt;/SPAN&gt; (&lt;SPAN class="hljs-keyword"&gt;PARTITION&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;BY&lt;/SPAN&gt; category) &lt;SPAN class="hljs-keyword"&gt;AS&lt;/SPAN&gt; total_value
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; sales;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="gb5fhw4"&gt;
&lt;DIV&gt;
&lt;DIV class=" iwpqfy0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;In this case, the query plan would involve a full window function execution, which can be computationally expensive.&lt;/P&gt;
&lt;H3 class="_1jeaq5e0 _1t7bu9h9 heading3"&gt;After Optimization:&lt;/H3&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;With the optimization enabled, the query can be rewritten to use an aggregate function instead, which improves performance by leveraging partial aggregation:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql _1t7bu9hb hljs language-sql gb5fhw3"&gt;&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; category, total_value
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; (
    &lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; category, &lt;SPAN class="hljs-built_in"&gt;SUM&lt;/SPAN&gt;(&lt;SPAN class="hljs-keyword"&gt;value&lt;/SPAN&gt;) &lt;SPAN class="hljs-keyword"&gt;AS&lt;/SPAN&gt; total_value
    &lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; sales
    &lt;SPAN class="hljs-keyword"&gt;GROUP&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;BY&lt;/SPAN&gt; category
) &lt;SPAN class="hljs-keyword"&gt;AS&lt;/SPAN&gt; aggregated_sales;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;This rewritten query avoids the overhead of running a window function by using a simple aggregation, which is more efficient.&lt;BR /&gt;&lt;BR /&gt;The optimization works by rewriting eligible window functions (those without an &lt;CODE&gt;ORDER BY&lt;/CODE&gt; clause or a &lt;CODE&gt;window_frame&lt;/CODE&gt; parameter) to use aggregate functions. This change allows the query to run faster by using partial aggregation and avoiding the overhead associated with window functions. The Spark configuration parameter &lt;CODE&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/CODE&gt; controls this optimization and is set to &lt;CODE&gt;true&lt;/CODE&gt; by default. To turn this optimization off, set &lt;CODE&gt;spark.databricks.optimizer.replaceWindowsWithAggregates.enabled&lt;/CODE&gt; to &lt;CODE&gt;false&lt;/CODE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2024 14:45:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/spark-databricks-optimizer-replacewindowswithaggregates-enabled/m-p/103558#M1773</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-30T14:45:27Z</dc:date>
    </item>
  </channel>
</rss>

