Sunday
I want to understand difference b/w Liquid Clustering VS Z-ordering and also how both works?
Sunday
Hi @Rupa0503
Liquid Clustering is basically the modern replacement for Z-ordering. Both are great for data skipping (faster reads), but Liquid fixes a lot of Z-order's headaches.
How They Work (and why Liquid wins)
Z-Ordering: It's rigid. When you add new data and run OPTIMIZE, it often has to rewrite a ton of your existing files to keep things sorted. It's slow and computationally expensive.
Liquid Clustering: It's flexible and incremental. When you optimize, Databricks only processes what it needs to. It's way faster to update, handles skewed data better, and lets you change clustering keys without rewriting the whole table.
How to Use It / Migrate Moving from Z-order to Liquid is super easy using ALTER TABLE:
Use Standard Liquid: ALTER TABLE table CLUSTER BY (col1, col2) (Just remember to run OPTIMIZE afterward!)
Use Auto Liquid: ALTER TABLE table CLUSTER BY AUTO (Note: requires Predictive Optimization enabled)
Turn it off: ALTER TABLE table CLUSTER BY NONE
My Personal Benchmarks & Recommendation I tested Z-order, Standard Liquid, and Auto Liquid with the exact same data and tables. Here is the verdict:
Reads: All three perform about the same.
Writes/Optimization: Auto Liquid is definitely the fastest.
Cost (My Pick): I personally stick to Standard Liquid Clustering to save money. Auto Liquid uses Predictive Optimization, which runs on Serverless compute and adds extra costs. Standard Liquid gives you all the incremental speed benefits over Z-order, but leaves you in control of your compute bill!
Sunday
More details here
Sunday
Hi @Rupa0503,
In simple terms... both Liquid Clustering and Z-ordering are ways to improve data layout so Databricks can skip more irrelevant files during reads, but they are not the same thing.
If I had to summarise it simply... Z-ordering is the older, more manual way to colocate related values in the same set of files, while Liquid Clustering is the newer, more flexible approach that Databricks now recommends for new tables.
A practical difference is that Liquid Clustering is designed to replace both partitioning and ZORDER for table layout, and it is not compatible with Z-ordering on the same table.
Hereโs the intuitive version of how they work:
One nice way to think about it is this.. Imagine a warehouse.
A few concrete differences that usually help..
So if you want a simple rule of thumb... a) Use Liquid Clustering for new tables. b) Think of Z-ordering mainly as the older layout optimisation mechanism you may still see on existing tables. c) Donโt use both together on the same table.
To @ShamenParis point around costs... Standard Liquid Clustering can be a good fit if you want the benefits of Liquid Clustering but prefer to control when maintenance runs and what compute it uses. Automatic Liquid Clustering depends on Predictive Optimisation, which runs maintenance on serverless jobs compute and is billed separately. That said, Auto Liquid is designed to be cost-aware and can reduce overall TCO when the performance gains justify the maintenance cost. I wouldn't classify that as an expensive mode.
If useful, the official docs here are the best references... Liquid Clustering docs and Data skipping and Z-ordering docs.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
9 hours ago
Z-Ordering physically sorts data using multi-dimensional ordering, but degrades as new data arrives โ requiring full, expensive OPTIMIZE reruns to maintain.
Liquid Clustering (DBR 13.3+) replaces both Z-Ordering and Hive partitioning. You define it once with CLUSTER BY (col1, col2) and OPTIMIZE only rewrites "drifted" files incrementally โ much cheaper.
Key wins with Liquid Clustering:
Bottom line: For any new table, go with Liquid Clustering. Z-Ordering is legacy at this point.