- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2025 03:00 AM
Hi all,
I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.
Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size ranges for each category.
Any recommendations or best practices for applying LC across different table sizes would be appreciated!
Looking forward to hearing about your experiences and insights on whether LC should be adopted at various data scales and what the tangible benefits are.
Thanks in advance for sharing your knowledge!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2025 03:46 AM
Liquid Clustering replaces manual partitioning and Z-Ordering with adaptive file clustering.
It keeps your data physically organized for faster queries and merges, without forcing you to manage partition columns or compaction jobs.
It’s powered by cluster-by keys, Delta’s internal clustering metadata, and automatic reclustering handled by the Delta optimizer.
Table Size Rough Range LC Benefit Notes
| Small | < 10 GB or < 50 million rows | Limited | Metadata overhead may outweigh benefit. Stick with Delta defaults or small Z-ORDER. |
| Medium | 10 GB – 1 TB or 50M–1B rows | Strong | Ideal range — LC improves scan times, merges, and compaction efficiency. |
| Large | > 1 TB or billions of rows | Very high | Major gains in data skipping and read performance, especially for multi-year or multi-tenant data. |
More you can find in the documentation. If you have a specific case, not generic, I am more than happy to advise.
https://docs.databricks.com/aws/en/delta/clustering
https://docs.databricks.com/aws/en/delta/best-practices
https://docs.databricks.com/aws/en/delta/optimize
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2025 05:53 AM
Hi @bianca_unifeye , thank you for your response.
My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario.
Thanks in advance for sharing your knowledge!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2025 02:08 AM
For tables ranging from 1 KB → 5 TB, you’ll usually end up with a mixed strategy. LC is not “all or nothing”; it shines when the physical size + update pattern justify the clustering overhead.
Use Liquid Clustering when:
clustering keys have natural selectivity (e.g., customer_id, timestamp)
MERGE/DELETE/UPDATE operations happen regularly
the table grows continuously
multiple teams access different slices of the data
you want predictable performance without manual tuning
Avoid Liquid Clustering when:
data is tiny
table rarely changes
workload is sequential full scans
cluster-by keys have low cardinality