cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What's the best practice on running ANALYZE on Delta Tables for query performance optimization?

Anand_Ladda
Honored Contributor II
 
1 ACCEPTED SOLUTION

Accepted Solutions

Anand_Ladda
Honored Contributor II
  • The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
  • The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
  • The reason docs currently say Do not run on Delta tablesโ€™ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
  • General best practices:

View solution in original post

2 REPLIES 2

User16826994223
Honored Contributor III

Nicely Written

Anand_Ladda
Honored Contributor II
  • The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
  • The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
  • The reason docs currently say Do not run on Delta tablesโ€™ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
  • General best practices:

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.