cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How big should a delta table be to benefit from liquid clustering?

DatBoi
Contributor

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?

And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?

I have been looking for some documentation / resources to dive into these details but can't seem to find any. Everything I have found online is just covering the basics. Is there something like this out there?

Thanks in advance for the help.

1 ACCEPTED SOLUTION

Accepted Solutions

daniel_sahal
Esteemed Contributor

@DatBoi 
Once you watch this video you'll understand more about Liquid Clustering 🙂

https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake

Long story short:

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?


Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.


And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?


From my experience - not really. 

View solution in original post

2 REPLIES 2

daniel_sahal
Esteemed Contributor

@DatBoi 
Once you watch this video you'll understand more about Liquid Clustering 🙂

https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake

Long story short:

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?


Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.


And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?


From my experience - not really. 

Got it - will take a look at that video. Thanks for the response. 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.