Databricks Community

SusmithaBadam · ‎08-04-2025

Hi There,

I have a table of 160 GB with partition applied on country and yearmonth columns, I maintain a previous history of 6 years and replace the partitions (latest 2 months) to add the new data.

I use overwrite mode to replace the effected partitions. The entire ETL process executes without any failure but with heavy skewness in data partitions. I did a POC with liquid clustering by reducing table size to 45GB, but did not see much improvement.

Observation:

Select with group by on the cluster table with Optimize takes 39sec where as the partitioned table takes 2 sec. Could see a better write but read performance is much degraded.

I have attached an excel with read/write performance difference. I want to utilize the liquid clustering advantages but no luck.

Renu_ · ‎08-04-2025

Hi @SusmithaBadam, based on your use case, partitioned tables are performing better because they work kind of like labeled folders. When you group by, it can quickly go to the exact folder instead of scanning everything, so it’s much faster.

Liquid clustering, on the other hand, shines when you need to filter on other detailed (high-cardinality) columns, but for your group-by queries on the partition columns, it can’t take that shortcut. So for your current setup, sticking with partitioned tables makes more sense performance-wise.

Databricks Community

Liquid clustering not improved performance

Join Us as a Local Community Builder!

🎬 Databricks Community 2025 Highlights | A Year, Built Together

🌟 Community Pulse: Your Weekly Roundup! December 22, 2025 – January 04, 2026

Solution Accelerator Series | Scale cybersecurity analytics with Splunk and Databricks

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!