cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Optimize Delta Table Performance in Databricks?

gardenmap
New Contributor II

I'm working with large Delta tables in Databricks and noticing slower performance during read operations. I've already enabled Z-ordering and auto-optimize, but it still feels sluggish at scale. Are there best practices or settings I should adjust for better query performance? Also, is there a way to monitor the impact of each optimization? 

2 REPLIES 2

BrickByBrick
New Contributor II

Last week, I attended a Dev Connect event in London and came across a new optimization technique called Liquid Clustering (Next-gen Clustering).
Here are the Key Benefits of Liquid Clustering Over Z-Ordering , would recommended you to deep dive into it.

-No need to run OPTIMIZE manually โ€” reduces job scheduling and compute cost.
-Automatically adapts to changing data and query patterns.
-Reduces data skew more effectively than static partitioning + ZORDER.
-Better performance for large-scale, frequently updated tables.
-Simplifies pipeline management โ€” no need to manage clustering logic separately.

Liquid Clustering functionality and automatic clustering improvements are most robust in:
-Databricks Runtime 14.0+
-Unity Catalog-enabled tables
-Delta Lake format (version 2 or higher)

Cheers

igorborba
New Contributor II

Hi @gardenmap, if possible can you detail more?

For example, in my case what I've done:

  • For tables above 1TB as it's can segregated by date, we've decided to enable a partition by the date column;
  • Independent if it's partitioned or not, we decided to make a sequence of OPTIMIZE and VACUUM for specific and necessary columns, not all 32 first columns;
  • As we have a lot of scenarios with the usage of MERGE INTO by each 5, 10 and 60 min, it's necessary to activate auto optimize, but apply a optimize with vacuum minimally by week.

Doubts:

  • When you working with your tables, are use Spark SQL API or Databricks SQL?
  • Area you using Databricks SQL Endpoints?
  • Are you what type and size of the cluster if you are using Job Cluster ou All Purpose Clusters? Machines with SSD?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now