- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-29-2024 01:13 PM - edited 02-29-2024 01:29 PM
My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?
I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?
And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?
I have been looking for some documentation / resources to dive into these details but can't seem to find any. Everything I have found online is just covering the basics. Is there something like this out there?
Thanks in advance for the help.
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2024 12:51 AM
@DatBoi
Once you watch this video you'll understand more about Liquid Clustering 🙂
https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake
Long story short:
I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?
Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.
And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?
From my experience - not really.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2024 12:51 AM
@DatBoi
Once you watch this video you'll understand more about Liquid Clustering 🙂
https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake
Long story short:
I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?
Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.
And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?
From my experience - not really.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2024 10:53 AM
Got it - will take a look at that video. Thanks for the response.

