08-07-2024 09:16 AM
I am trying to understand the difference between Liquid clustering and z-ordering. As per my understanding, both stores the clustered information into ZCubes which is of size 100 GB.
Liquid Clustering maintains ZCube id in transaction log so when optimize command gets executed then it will rearrange the data only in unclustered ZCube, by this way it will avoid complete re-organizing entire table or data at partition level.
Z-Ordering do not maintain ZCube ID in the transaction log so whenever Optimize command gets executed then it ends up re-organizing entire table or data at each partition level which is heavy write operation.
Please correct me if i am wrong and request to add any more differences which explains the architecture of these two.
08-12-2024 12:30 AM
@Prashanth24 wrote:1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?
Technically, due to Delta Lake's history, on file-level you don't update existing files, you always create new ones.
2) Once the table is created with Liquid clustering then how we check which columns are used for LC?
Yes, look into the table structure, e.g. by using DESCRIBE TABLE.
3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?
Exactly.
2) Was Liquid clustering released in Delta Lake 4.0?
No, the first preview was released in 3.1, in 3.2 it became final.
08-09-2024 05:02 AM
Hi, @acj1459, Liquid Clustering and Z-Ordering both use 100 GB ZCubes but differ in their optimization and performance characteristics. Liquid Clustering maintains ZCube IDs in the transaction log and optimizes data only within unclustered ZCubes, making it efficient for write-heavy operations with minimal reorganization. In contrast, Z-Ordering does not track ZCube IDs and reorganizes the entire table or partitions during optimization, which can result in heavier write operations but may offer better read performance. Liquid Clustering is ideal for scenarios with frequent updates, while Z-Ordering is suited for read-heavy workloads.
08-11-2024 08:22 AM
This is good information. It is been mentioned that LC is replacement of both partitioning and Z-ordering. Partitioning basically places the data in the different folders. LC wont create such folders and place the data. I have below few questions
1) So how it is internally handling partitioning? Anywhere this information will be stored?
2) Was Liquid clustering released in Delta Lake 4.0?
08-09-2024 08:33 AM - edited 08-09-2024 08:33 AM
What might also help in your understanding are the design docs of Liquid Clustering (LC). There are a lot of more changes between these two than the ones mentioned. Like Hilbert curves over Z-Order curves. LC is incremental, Z-Ordering is not. In LC you specify clustering columns during table creation, in Z-Ordering with the OPTIMIZE function. Etc...
08-11-2024 08:32 AM
I gone through the document. Gave good insight information on this. Have few below questions
1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?
2) Once the table is created with Liquid clustering then how we check which columns are used for LC?
3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?
08-12-2024 12:30 AM
@Prashanth24 wrote:1) Suppose if we receive incremental data with some modifications to existing record and in this case whether existing clustered ZCubes will be re-organized again? In this case many existing organized clustered Zcubes will be processed again to maintain the data locality?
Technically, due to Delta Lake's history, on file-level you don't update existing files, you always create new ones.
2) Once the table is created with Liquid clustering then how we check which columns are used for LC?
Yes, look into the table structure, e.g. by using DESCRIBE TABLE.
3) So Z-ordering syntax cannot be used in Create or Alter table query (how we do in case of LC) and needs to execute only with Optimize command?
Exactly.
2) Was Liquid clustering released in Delta Lake 4.0?
No, the first preview was released in 3.1, in 3.2 it became final.
08-11-2024 09:47 PM
Hi Prashanth,
Liquid Clustering only reorganizes parts of the data that aren't already clustered to make it more efficient. Z-Ordering, on the other hand, reorganizes the entire table or partitions every time, which is more resource-intensive.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group