Re: Seeking Insights on Liquid Clustering (LC) Bas...

pooja_bhumandla · ‎11-12-2025

Hi all,

I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.
Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size ranges for each category.

Any recommendations or best practices for applying LC across different table sizes would be appreciated!
Looking forward to hearing about your experiences and insights on whether LC should be adopted at various data scales and what the tangible benefits are.

Thanks in advance for sharing your knowledge!

bianca_unifeye · ‎11-12-2025

Liquid Clustering replaces manual partitioning and Z-Ordering with adaptive file clustering.
It keeps your data physically organized for faster queries and merges, without forcing you to manage partition columns or compaction jobs.

It’s powered by cluster-by keys, Delta’s internal clustering metadata, and automatic reclustering handled by the Delta optimizer.

Table Size Rough Range LC Benefit Notes

Small	< 10 GB or < 50 million rows	Limited	Metadata overhead may outweigh benefit. Stick with Delta defaults or small Z-ORDER.
Medium	10 GB – 1 TB or 50M–1B rows	Strong	Ideal range — LC improves scan times, merges, and compaction efficiency.
Large	> 1 TB or billions of rows	Very high	Major gains in data skipping and read performance, especially for multi-year or multi-tenant data.

More you can find in the documentation. If you have a specific case, not generic, I am more than happy to advise.

https://docs.databricks.com/aws/en/delta/clustering

https://docs.databricks.com/aws/en/delta/best-practices

https://docs.databricks.com/aws/en/delta/optimize

pooja_bhumandla · ‎11-12-2025

Hi @bianca_unifeye , thank you for your response.

My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario.

Thanks in advance for sharing your knowledge!

bianca_unifeye · ‎11-14-2025

For tables ranging from 1 KB → 5 TB, you’ll usually end up with a mixed strategy. LC is not “all or nothing”; it shines when the physical size + update pattern justify the clustering overhead.

Use Liquid Clustering when:

clustering keys have natural selectivity (e.g., customer_id, timestamp)
MERGE/DELETE/UPDATE operations happen regularly
the table grows continuously
multiple teams access different slices of the data
you want predictable performance without manual tuning

Avoid Liquid Clustering when:

data is tiny
table rarely changes
workload is sequential full scans
cluster-by keys have low cardinality

View solution in original post

Seeking Insights on Liquid Clustering (LC) Based on Table Sizes