ANALYZE command for stats collection - distinct_count difference for a column

Sainath368
Contributor

I ran the command ANALYZE TABLE <TABLE_NAME> COMPUTE STATISTICS FOR COLUMNS 'COL1,COL2' to compute statistics for specific columns, but I noticed that the distinct counts returned by this command differ from the results obtained by running a direct COUNT(DISTINCT COL1) query.  My question is does this ANALYZE TABLE <TABLE_NAME> COMPUTE STATISTICS FOR COLUMNS 'COL1,COL2' command calculate statistics accurately? If it does, what could explain the difference I’m seeing between the distinct counts from this command and the distinct counts from running a direct COUNT(DISTINCT COL1) query on the same table?

Here are the attached screenshots for reference:

describe  extended <table_name> col1

Sainath368_2-1751986880556.png

select count(distinct(col1)) from <table_name>

Sainath368_3-1751986940034.png