cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta table acceleration for group by on key columns using ZORDER does not work

chanansh
Contributor

What is the best practice for accelerating queries which looks like the following?

win = Window.partitionBy('key1','key2').orderBy('timestamp') 
df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))

I have tried to use OPTIMIZE table ZORDER key1, key2 but it does not work well.

There is a lot of shuffling going on. The data is partitioned by date which is a generated column from timestamp.

2 REPLIES 2

Anonymous
Not applicable

@Hanan Shteingartโ€‹ : To optimize queries with window functions, you should try multiple things like - filter the data, reduce the number of columns used in the window, optimize the window partitioning and ordering, and increase the cluster size if needed. Please try the below options

  1. FIlter the dataset df before applying the window function
  2. Reduce the number of columns in df before applying the window function
  3. Partition the dataframe df by choosing the right partition key that reduces the number of partitions and distributes the data evenly across the partitions
  4. Choose an ordering key that reduces the amount of data that needs to be sorted and processed
  5. Final resort, increase the size of your cluster to allocate more computing resources to the query

Anonymous
Not applicable

Hi @Hanan Shteingartโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group