Databricks Community

gardenmap · ‎05-03-2025

I'm working with large Delta tables in Databricks and noticing slower performance during read operations. I've already enabled Z-ordering and auto-optimize, but it still feels sluggish at scale. Are there best practices or settings I should adjust for better query performance? Also, is there a way to monitor the impact of each optimization?

BrickByBrick · ‎05-04-2025

Last week, I attended a Dev Connect event in London and came across a new optimization technique called Liquid Clustering (Next-gen Clustering).
Here are the Key Benefits of Liquid Clustering Over Z-Ordering , would recommended you to deep dive into it.

-No need to run OPTIMIZE manually — reduces job scheduling and compute cost.
-Automatically adapts to changing data and query patterns.
-Reduces data skew more effectively than static partitioning + ZORDER.
-Better performance for large-scale, frequently updated tables.
-Simplifies pipeline management — no need to manage clustering logic separately.

Liquid Clustering functionality and automatic clustering improvements are most robust in:
-Databricks Runtime 14.0+
-Unity Catalog-enabled tables
-Delta Lake format (version 2 or higher)

Cheers

igorborba · ‎05-04-2025

Hi @gardenmap, if possible can you detail more?

For example, in my case what I've done:

For tables above 1TB as it's can segregated by date, we've decided to enable a partition by the date column;
Independent if it's partitioned or not, we decided to make a sequence of OPTIMIZE and VACUUM for specific and necessary columns, not all 32 first columns;
As we have a lot of scenarios with the usage of MERGE INTO by each 5, 10 and 60 min, it's necessary to activate auto optimize, but apply a optimize with vacuum minimally by week.

Doubts:

When you working with your tables, are use Spark SQL API or Databricks SQL?
Area you using Databricks SQL Endpoints?
Are you what type and size of the cluster if you are using Job Cluster ou All Purpose Clusters? Machines with SSD?

Databricks Community

How to Optimize Delta Table Performance in Databricks?

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples