cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How big should a delta table be to benefit from liquid clustering?

DatBoi
Contributor

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?

And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?

I have been looking for some documentation / resources to dive into these details but can't seem to find any. Everything I have found online is just covering the basics. Is there something like this out there?

Thanks in advance for the help.

1 ACCEPTED SOLUTION

Accepted Solutions

daniel_sahal
Esteemed Contributor

@DatBoi 
Once you watch this video you'll understand more about Liquid Clustering 🙂

https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake

Long story short:

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?


Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.


And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?


From my experience - not really. 

View solution in original post

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of
2 REPLIES 2

daniel_sahal
Esteemed Contributor

@DatBoi 
Once you watch this video you'll understand more about Liquid Clustering 🙂

https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLake

Long story short:

I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB partitions. Does this hold true for liquid clustering?


Clustering is a little bit different from partitioning. The main issue with partitioning tables less than 1TB is that it could create a lot of small files, that could negatively impact performance. With liquid clustering there's no such issue.


And vice versa - will clustering on a small table < 1TB or even < 1GB hinder the performance of queries?


From my experience - not really. 

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems. Liquid Clustering is Delta Lake's answer to the complex challenges of

Got it - will take a look at that video. Thanks for the response. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group