When to Use and when Not to Use Liquid Clustering?

pooja_bhumandla
Databricks Partner

 

Hi everyone,

I’m looking for some practical guidance and experiences around when to choose Liquid Clustering versus sticking with traditional partitioning + Z-ordering.

From what I’ve gathered so far:

For small tables (<10TB), Liquid Clustering gives similar performance to traditional approaches if queries consistently filter on 1–2 columns.

For lookups on more than two columns, partitioning with Z-ordering might offer better and more predictable read performance.

The number of files and number of columns also seems to impact efficiency — too many clustering keys (e.g., 4+) may hurt performance for single-column lookups.

But I’d love to hear from others:

  • How do you decide when Liquid Clustering is worth it?
  • Have you seen clear performance gains (or drawbacks) based on table size, number of clustering columns, or file count?
  • Any best practices or gotchas from your real-world implementations?

Appreciate any insights, benchmarks, or rules of thumb the community can share!