cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Fatctors deciding to choose between zorder, partitioning and liquid clustering

ShivangiB
New Contributor II

What are the factors on which we should choose the optimization approach

1 ACCEPTED SOLUTION

Accepted Solutions

Nik_Vanderhoof
New Contributor III

In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.

Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.

With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.

Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns. 

Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M 

View solution in original post

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vรญtor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of
2 REPLIES 2

Nik_Vanderhoof
New Contributor III

In several ways, liquid clustering is more flexible than either hive-style partitioning or z-ordering.

Liquid clustering allows us to change clustering keys without re-writing the entire table. Since stakeholders often query only more recent data, this can be very powerful as we can change cluster keys, and all future OPTIMIZE commands will only rewrite recent data.

With OPTIMIZE ZORDER, you have to specify the zorder keys each time you run an OPTIMIZE command, which can be error-prone. Liquid cluster keys are stored in table properties, so you need not remember the keys when running an OPTIMIZE command.

Partitioning can help data skipping for a single column, or multiple related columns (like year/month/day), but not unrelated columns. 

Both ZORDER and Liquid Clustering are techniques to improve data skipping for multiple independent columns. To do this, both map multi-dimensional data into a single dimension, and group data points with a similar value together. However, Liquid Clustering's technique for this is better at grouping similar data together. You can learn more about it here: https://www.youtube.com/watch?v=5t6wX28JC_M 

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vรญtor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of

canadiandataguy
New Contributor III

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group