cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Liquid clustering not improved performance

SusmithaBadam
New Contributor II

Hi There,

I have a table of 160 GB with partition applied on country and yearmonth columns, I maintain a previous history of 6 years and replace the partitions (latest 2 months) to add the new data.

I use overwrite mode to replace the effected partitions. The entire ETL process executes without any failure but with heavy skewness in data partitions. I did a POC with liquid clustering by reducing table size to 45GB, but did not see much improvement.

Observation:

Select with group by on the cluster table with Optimize takes 39sec where as the partitioned table takes 2 sec. Could see a better write but read performance is much degraded.

I have attached an excel with read/write performance difference. I want to utilize the liquid clustering advantages but no luck.

1 REPLY 1

Renu_
Valued Contributor II

Hi @SusmithaBadam, based on your use case, partitioned tables are performing better because they work kind of like labeled folders. When you group by, it can quickly go to the exact folder instead of scanning everything, so it’s much faster.

Liquid clustering, on the other hand, shines when you need to filter on other detailed (high-cardinality) columns, but for your group-by queries on the partition columns, it can’t take that shortcut. So for your current setup, sticking with partitioned tables makes more sense performance-wise.