cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Difference between Liquid clustering and Z-ordering

Prashanth24
New Contributor III

I am trying to understand the difference between Liquid clustering and z-ordering. As per my understanding, both stores the clustered information into ZCubes which is of size 100 GB.

Liquid Clustering maintains ZCube id in transaction log so when optimize command gets executed then it will rearrange the data only in unclustered ZCube, by this way it will avoid complete re-organizing entire table or data at partition level. 

Z-Ordering do not maintain ZCube ID in the transaction log so whenever Optimize command gets executed then it ends up re-organizing entire table or data at each partition level which is heavy write operation. 

Please correct me if i am wrong and request to add any more differences which explains the architecture of these two.

1 ACCEPTED SOLUTION

Accepted Solutions


@Prashanth24 wrote:

1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?

Technically, due to Delta Lake's history, on file-level you don't update existing files, you always create new ones.

2) Once the table is created with Liquid clustering then how we check which columns are used for LC?

Yes, look into the table structure, e.g. by using DESCRIBE TABLE.

3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?


Exactly.

2) Was Liquid clustering released in Delta Lake 4.0?


No, the first preview was released in 3.1, in 3.2 it became final.

View solution in original post

5 REPLIES 5

This is good information. It is been mentioned that LC is replacement of both partitioning and Z-ordering. Partitioning basically places the data in the different folders. LC wont create such folders and place the data. I have below few questions

1) So how it is internally handling partitioning? Anywhere this information will be stored?

2) Was Liquid clustering released in Delta Lake 4.0?

Witold
Contributor III

What might also help in your understanding are the design docs of Liquid Clustering (LC). There are a lot of more changes between these two than the ones mentioned. Like Hilbert curves over Z-Order curves. LC is incremental, Z-Ordering is not. In LC you specify clustering columns during table creation, in Z-Ordering with the OPTIMIZE function. Etc...

Prashanth24
New Contributor III

I gone through the document. Gave good insight information on this. Have few below questions

1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?

2) Once the table is created with Liquid clustering then how we check which columns are used for LC?

3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?

 


@Prashanth24 wrote:

1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?

Technically, due to Delta Lake's history, on file-level you don't update existing files, you always create new ones.

2) Once the table is created with Liquid clustering then how we check which columns are used for LC?

Yes, look into the table structure, e.g. by using DESCRIBE TABLE.

3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?


Exactly.

2) Was Liquid clustering released in Delta Lake 4.0?


No, the first preview was released in 3.1, in 3.2 it became final.

Brahmareddy
Valued Contributor III

Hi Prashanth,

Liquid Clustering only reorganizes parts of the data that aren't already clustered to make it more efficient. Z-Ordering, on the other hand, reorganizes the entire table or partitions every time, which is more resource-intensive.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group