cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Fatctors deciding to choose between zorder, partitioning and liquid clustering

ShivangiB
New Contributor II

What are the factors on which we should choose the optimization approach

1 ACCEPTED SOLUTION

Accepted Solutions

Nik_Vanderhoof
Contributor

In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.

Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.

With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.

Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns. 

Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M 

View solution in original post

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of
2 REPLIES 2

Nik_Vanderhoof
Contributor

In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.

Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.

With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.

Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns. 

Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M 

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of

canadiandataguy
New Contributor III

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now