cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

COMPUTE STATISTICS- QUERY OPTIMIZER

Sainath368
New Contributor III

I ran ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS on my tables to calculate column-level statistics at the file level for each Parquet file inside the Delta table. Now, Iโ€™m wondering which command is better to run next: ANALYZE TABLE <table_name> COMPUTE STATISTICS or ANALYZE TABLE <table_name> COMPUTE STATISTICS FOR ALL COLUMNS? Since I already have column-level stats computed via delta statistics, I want to understand the difference and what the query optimizer uses, how it works?

1 REPLY 1

SP_6721
Contributor III

Hi @Sainath368 ,

From what I understand, COMPUTE DELTA STATISTICS generates Delta statistics that are mainly used for data skipping, helping speed up table scans by avoiding unnecessary file reads. However, these stats aren't used by the query optimizer.

If you want to help the optimizer make better decisions, you'll need to run COMPUTE STATISTICS FOR ALL COLUMNS.

  • COMPUTE STATISTICS gathers table-level statistics,
  • COMPUTE STATISTICS FOR ALL COLUMNS collects both table-level and column-level statistics.

These stats allow the optimizer to estimate table sizes and column data characteristics, which improves query planning like deciding which table to scan first or whether to broadcast a table in a join