cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ANALYZE TABLE is not updating columns stats

vlado101
New Contributor II

Hello everyone,

So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I know is to use "DESCRIBE EXTENDED table_name".

The problem is that when I run ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS and then use DESCRIBE EXTENDED on the table I see that that for under the "comment" column all the values are null. Per my understanding they should not be null, but have min/max values or at least something other than null.

I am also doing this for EXTERNAL tables. Did anyone run into the same issue?

1 ACCEPTED SOLUTION

Accepted Solutions

Priyanka_Biswas
Databricks Employee
Databricks Employee

Hello @vlado101 

The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as choosing the proper join type, selecting the correct build side in a hash-join, and calibrating the join order in a multi-way join.

However, it seems you are expecting the ANALYZE TABLE command to populate the "comment" column when you run the DESCRIBE EXTENDED table_name command. This is not correct. The "comment" column in the output of DESCRIBE EXTENDED is not related to the statistics computed by ANALYZE TABLE. The "comment" column is used to store any comments that have been added to the table or column when it was created or altered, not the statistics.
If you want to view the statistics of a table, you can use the DESCRIBE EXTENDED command, but you should look at the "statistics" field in the output, not the "comment" field. If the statistics are not being updated as expected, it could be due to a number of reasons, such as the table not being analyzed correctly, or a delay in the update of the metastore.

Here is an example of how you can use the DESCRIBE EXTENDED command to view the statistics of a table:
sql
DESCRIBE EXTENDED table_name;
Look for the "statistics" field in the output to see the computed statistics.

View solution in original post

1 REPLY 1

Priyanka_Biswas
Databricks Employee
Databricks Employee

Hello @vlado101 

The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as choosing the proper join type, selecting the correct build side in a hash-join, and calibrating the join order in a multi-way join.

However, it seems you are expecting the ANALYZE TABLE command to populate the "comment" column when you run the DESCRIBE EXTENDED table_name command. This is not correct. The "comment" column in the output of DESCRIBE EXTENDED is not related to the statistics computed by ANALYZE TABLE. The "comment" column is used to store any comments that have been added to the table or column when it was created or altered, not the statistics.
If you want to view the statistics of a table, you can use the DESCRIBE EXTENDED command, but you should look at the "statistics" field in the output, not the "comment" field. If the statistics are not being updated as expected, it could be due to a number of reasons, such as the table not being analyzed correctly, or a delay in the update of the metastore.

Here is an example of how you can use the DESCRIBE EXTENDED command to view the statistics of a table:
sql
DESCRIBE EXTENDED table_name;
Look for the "statistics" field in the output to see the computed statistics.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group