Delta table acceleration for group by on key columns using ZORDER does not work
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2023 05:02 AM
What is the best practice for accelerating queries which looks like the following?
win = Window.partitionBy('key1','key2').orderBy('timestamp')
df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OPTIMIZE table ZORDER key1, key2 but it does not work well.
There is a lot of shuffling going on. The data is partitioned by date which is a generated column from timestamp.