cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What's the best practice on running ANALYZE on Delta Tables for query performance optimization?

aladda
Databricks Employee
Databricks Employee
 
1 ACCEPTED SOLUTION

Accepted Solutions

aladda
Databricks Employee
Databricks Employee
  • The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
  • The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
  • The reason docs currently say Do not run on Delta tablesโ€™ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
  • General best practices:

View solution in original post

3 REPLIES 3

User16826994223
Honored Contributor III

Nicely Written

aladda
Databricks Employee
Databricks Employee
  • The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
  • The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
  • The reason docs currently say Do not run on Delta tablesโ€™ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
  • General best practices:

jlickt
New Contributor II

Super write-up; very useful in understanding how the Delta and non-Delta approaches have evolved.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group