Databricks Community

sd1700092 · 4 weeks ago

Hi Databricks Support,

We need help confirming whether this is a known DBR 15.4 LTS bug or an unsupported/configuration-specific behavior.

Summary

On a Databricks Runtime 15.4.40 Photon job cluster, `ANALYZE TABLE <catalog>.<schema>.<table> COMPUTE STATISTICS FOR ALL COLUMNS` completes successfully but does not update table-level or column-level optimizer statistics for a Unity Catalog managed Delta table.

The same SQL run from a DBSQL Serverless warehouse works correctly and refreshes the column stats immediately.

Environment

- Runtime: Databricks Runtime 15.4.40 LTS
- Engine: Photon
- Cluster type: job cluster
- Table type: Unity Catalog managed Delta table
- Table properties:
- `delta.universalFormat.enabledFormats = iceberg`
- `delta.enableIcebergCompatV2 = true`
- `delta.columnMapping.mode = name`
- `delta.dataSkippingNumIndexedCols = 50`
- `delta.checkpointPolicy = classic`
- `delta.enableDeletionVectors = false`
- Predictive Optimization: disabled/inherited disabled at catalog level

Affected table

Example table:

`stg_silver_internal.payment.payment_merchant_read_merchant_preauth_transaction`

Reproduction

In the same DBR 15.4.40 Photon job-cluster session:

Run a silver job with 'ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS' but no use.

Comparison
Running the same ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS statement from a DBSQL Serverless warehouse works correctly. The DESCRIBE TABLE EXTENDED <table> <column> stats refresh immediately afterward.

Things already tried
Fully qualified three-part table name
USE CATALOG <catalog>
REFRESH TABLE <table>; result says refresh was not needed
Toggling spark.databricks.delta.uniform.iceberg.sync.convert.enabled
Verifying the same session sees the actual latest data via SELECT max(esupdatedat)
None of these changed the DBR job-cluster behavior.

Questions
Is this a known issue in DBR 15.4 LTS for UC managed Delta tables with UniForm/Iceberg compatibility and column mapping?
Is ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS expected to work from an all-purpose/job cluster in this table configuration?
Is the supported workaround to run optimizer-stat collection from DBSQL Serverless / SQL warehouse instead of DBR job clusters?
Should we use ANALYZE TABLE ... COMPUTE DELTA STATISTICS in addition to regular COMPUTE STATISTICS, or is that only for Delta file-skipping stats and not a replacement for CBO column stats?
Is this fixed in a newer DBR runtime such as DBR 16.x?
Please advise the recommended production-safe workaround and whether this should be treated as a DBR 15.4 LTS bug.

Thanks.

Ashwin_DSA · 4 weeks ago

Hi @sd1700092,

From what I can verify, this looks more like a DBR 15.4 job-cluster issue than expected behaviour. The public ANALYZE TABLE documentation is clear that ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS applies to both Databricks Runtime and Databricks SQL, and that FOR ALL COLUMNS should collect both table-level and column-level statistics for the query optimiser.

I also couldn’t find any public documentation saying this is unsupported for Unity Catalog managed Delta tables that use UniForm/Iceberg compatibility or column mapping. In other words, based on the docs, this should work from a job or all-purpose cluster as well, so the fact that it succeeds but does not refresh optimiser stats on DBR 15.4 while the same statement works immediately from a DBSQL Serverless warehouse points more toward a runtime-specific bug or regression than a configuration that is simply unsupported.

It’s also worth noting that ANALYZE TABLE ... COMPUTE DELTA STATISTICS is not a replacement for COMPUTE STATISTICS FOR ALL COLUMNS. The same docs explicitly say that when DELTA is specified, normal optimiser statistics are not collected. That command is for Delta log and data-skipping statistics, whereas COMPUTE STATISTICS is the one used for cost-based optimisation and query planning.

So if you want a production-safe workaround today, the recommendation would be to run optimiser stats collection from a SQL warehouse, since that path is behaving correctly in your testing and is also a documented execution environment for ANALYZE TABLE. If the table is a Unity Catalog managed table, it is also worth noting that Databricks recommends predictive optimization for this exact area, because it automatically runs ANALYZE on UC managed tables to keep optimizer stats current.

On the question of newer runtimes, I didn’t find a public fix note that specifically says this exact behaviour is resolved in DBR 16.x. What I did find in the public cost-based optimizer documentation is that DBR 16.0 and above adds better visibility in EXPLAIN, including whether referenced tables have missing, partial, or full statistics, which at least makes validation and troubleshooting easier.

I'd recommend a raising a Databricks support ticket to check if this is a runtime-specific bug as support teams don't pick up such requests from the community posts.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

Ashwin_DSA · 4 weeks ago