FOR COLUMNS Not Supported in Delta ANALYZE?

pooja_bhumandla
Databricks Partner

Hi everyone,

I’m running into the following error when trying to run ANALYZE on a Delta table:


[INVALID_SQL_SYNTAX.ANALYZE_TABLE_DELTA_STATS_UNEXPECTED_TOKEN] Invalid SQL syntax: ANALYZE TABLE(S) ... COMPUTE DELTA STATISTICS FOR doesn't support: FOR ALL COLUMNS, FOR COLUMNS, NOSCAN, and PARTITION clauses. SQLSTATE: 42000

using 

   ANALYZE TABLE your_catalog.your_schema.your_table
   COMPUTE DELTA STATISTICS
   FOR COLUMNS col33, col34, col35, col36, col37, col38, col39, col40, col41, col42;

My intent was to compute statistics only for a subset of columns (using FOR COLUMNS) after recently adding some new columns to dataSkippingStats.

My questions:

  1. Does Delta Lake support column-level ANALYZE at all (FOR COLUMNS / FOR ALL COLUMNS)?
  2. If not, what is the correct way to recompute stats only for the newly added columns?
  3. Is re-running ANALYZE TABLE table_name COMPUTE DELTA STATISTICS the only option?
    (Even if it recomputes stats for all columns?)

Additional context:

  • Initially had 32 columns in dataSkippingStats → stats were persisted for those.
  • Added few new columns (suppose 15 cols) recently.
  • Want to compute stats only for the newly added ones, if possible.

Any clarification would be appreciated!

Thanks!