cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.databricks.optimizer.replaceWindowsWithAggregates.enabled

OfirM
New Contributor

I have seen in the release notes of 15.3 that this was introduced and couldn't wrap my head around it.

Does someone has an example of a plan before and after?

Quote:

Performance improvement for some window functions

This release includes a change that improves the performance of some Spark window functions, specifically functions that do not include an ORDER BY clause or a window_frame parameter. In these cases, the system can rewrite the query to run it using an aggregate function. This change allows the query to run faster by using partial aggregation and avoiding the overhead of running window functions. The Spark configuration parameter spark.databricks.optimizer.replaceWindowsWithAggregates.enabled controls this optimization and is set to true by default. To turn this optimization off, set spark.databricks.optimizer.replaceWindowsWithAggregates.enabled to false.

 
1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

Before Optimization:

Consider a query that calculates the sum of a column value partitioned by category without an ORDER BY clause or a window_frame parameter:

 
SELECT category, SUM(value) OVER (PARTITION BY category) AS total_value
FROM sales;
 

In this case, the query plan would involve a full window function execution, which can be computationally expensive.

After Optimization:

With the optimization enabled, the query can be rewritten to use an aggregate function instead, which improves performance by leveraging partial aggregation:

 

SELECT category, total_value
FROM (
    SELECT category, SUM(value) AS total_value
    FROM sales
    GROUP BY category
) AS aggregated_sales;

This rewritten query avoids the overhead of running a window function by using a simple aggregation, which is more efficient.

The optimization works by rewriting eligible window functions (those without an ORDER BY clause or a window_frame parameter) to use aggregate functions. This change allows the query to run faster by using partial aggregation and avoiding the overhead associated with window functions. The Spark configuration parameter spark.databricks.optimizer.replaceWindowsWithAggregates.enabled controls this optimization and is set to true by default. To turn this optimization off, set spark.databricks.optimizer.replaceWindowsWithAggregates.enabled to false

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group