cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Liquid Clustering VS Z-ordering

Rupa0503
New Contributor II

I want to understand difference b/w Liquid Clustering VS Z-ordering and also how both works?

2 REPLIES 2

ShamenParis
New Contributor II

Hi @Rupa0503 

Liquid Clustering is basically the modern replacement for Z-ordering. Both are great for data skipping (faster reads), but Liquid fixes a lot of Z-order's headaches.

How They Work (and why Liquid wins)

  • Z-Ordering: It's rigid. When you add new data and run OPTIMIZE, it often has to rewrite a ton of your existing files to keep things sorted. It's slow and computationally expensive.

  • Liquid Clustering: It's flexible and incremental. When you optimize, Databricks only processes what it needs to. It's way faster to update, handles skewed data better, and lets you change clustering keys without rewriting the whole table.

How to Use It / Migrate Moving from Z-order to Liquid is super easy using ALTER TABLE:

  • Use Standard Liquid: ALTER TABLE table CLUSTER BY (col1, col2) (Just remember to run OPTIMIZE afterward!)

  • Use Auto Liquid: ALTER TABLE table CLUSTER BY AUTO (Note: requires Predictive Optimization enabled)

  • Turn it off: ALTER TABLE table CLUSTER BY NONE

My Personal Benchmarks & Recommendation I tested Z-order, Standard Liquid, and Auto Liquid with the exact same data and tables. Here is the verdict:

  • Reads: All three perform about the same.

  • Writes/Optimization: Auto Liquid is definitely the fastest.

  • Cost (My Pick): I personally stick to Standard Liquid Clustering to save money. Auto Liquid uses Predictive Optimization, which runs on Serverless compute and adds extra costs. Standard Liquid gives you all the incremental speed benefits over Z-order, but leaves you in control of your compute bill!

balajij8
Contributor III

@Rupa0503 

 
Both are optimization approaches for Delta Lake query performance but differ in flexibility and maintenance.
 
Z-Ordering is an optimization approach that co locates related data across multiple columns within files based on the setup you create.
  • You manually specify columns via OPTIMIZE table ZORDER BY (col1, col2) and run OPTIMIZE periodically to maintain layout as data grows. It's ideal for stable legacy read heavy workloads with predictable filter patterns
  • During OPTIMIZE, files are rewritten to interleave values across specified dimensions improving multi column filter skipping.
  • You can use Z Ordering for legacy tables with stable low-cardinality filters
 
Liquid Clustering is the modern & my recommended approach for new tables. It uses a tree-based algorithm to incrementally organize data by clustering keys without full rewrites.
  • Dynamic: Change clustering keys anytime via CLUSTER BY (cols) without rewriting existing data
  • Automatic & Incremental: Supports CLUSTER BY AUTO to allow Databricks select optimal keys based on query history.
  • Handles complexity: Better for high-cardinality columns, skewed data or evolving query pattern
  • Use Liquid Clustering for new tables with high-cardinality filters, concurrent writes or when query patterns evolve

More details here