cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Seeking Insights on Liquid Clustering (LC) Based on Table Sizes

pooja_bhumandla
New Contributor III

Hi all,

I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.
Specifically, Iโ€™m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size ranges for each category.

Any recommendations or best practices for applying LC across different table sizes would be appreciated!
Looking forward to hearing about your experiences and insights on whether LC should be adopted at various data scales and what the tangible benefits are.

Thanks in advance for sharing your knowledge!

2 REPLIES 2

bianca_unifeye
New Contributor II

Liquid Clustering replaces manual partitioning and Z-Ordering with adaptive file clustering.
It keeps your data physically organized for faster queries and merges, without forcing you to manage partition columns or compaction jobs.

Itโ€™s powered by cluster-by keys, Deltaโ€™s internal clustering metadata, and automatic reclustering handled by the Delta optimizer.

Table Size Rough Range LC Benefit Notes

Small< 10 GB or < 50 million rowsLimitedMetadata overhead may outweigh benefit. Stick with Delta defaults or small Z-ORDER.
Medium10 GB โ€“ 1 TB or 50Mโ€“1B rowsStrongIdeal range โ€” LC improves scan times, merges, and compaction efficiency.
Large> 1 TB or billions of rows Very highMajor gains in data skipping and read performance, especially for multi-year or multi-tenant data.

More you can find in the documentation. If you have a specific case, not generic, I am more than happy to advise.

https://docs.databricks.com/aws/en/delta/clustering

https://docs.databricks.com/aws/en/delta/best-practices

https://docs.databricks.com/aws/en/delta/optimize

 

pooja_bhumandla
New Contributor III

Hi @bianca_unifeye , thank you for your response.

My tables range in size from 1 KB to 5 TB. Given this, Iโ€™d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario. 

Thanks in advance for sharing your knowledge!