- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
What are the factors on which we should choose the optimization approach
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.
Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.
With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.
Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns.
Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.
Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.
With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.
Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns.
Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

