cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Behaviour of ANALYZE command varying when using different clusters and table types.

yshah
New Contributor

Certain tables have this configuration enabled, whereas others do not have it.
Delta.checkpointPolicy=v2

This is affecting the behavior of the ANALYZE command

If flag is enabled : Table stats are not visible after doing the DESCRIBE command using SINGLE user cluster
If flag is disabled : Table stats visible in standard "table properties" as key-value pairs using SINGLE user cluster

Please help us understand why this change in behavior exists for different cluster types and for tables with this flag

 

1 ACCEPTED SOLUTION

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee

@yshah this is a great question. Let me explain what's happening:

The Delta Lake table property `delta.checkpointPolicy=v2` changes how and where table statistics are stored and displayed when you run ANALYZE and DESCRIBE TABLE commands.

Classic vs V2 Checkpoint Policy

With `delta.checkpointPolicy=classic`:
Table stats are saved in the transaction log and shown as key-value pairs in table properties, which you can readily see using DESCRIBE TABLE—even on single-user clusters.

With `delta.checkpointPolicy=v2` enabled:
Stats are stored in optimized checkpoint files (such as manifests or sidecars), not as key-value pairs in table properties. As a result, DESCRIBE TABLE does not display these stats for tables with v2 checkpointing.

Why This Change Matters

The reason for this change is to boost performance and reduce metadata costs—especially important for streaming and high-frequency workloads. However, it also means some legacy behaviors and tools that expect stats in table properties will no longer see them unless you use the classic policy.

Recommendation

If you need stats to show up in table properties for use with legacy workflows or third-party tools, stick with `delta.checkpointPolicy=classic`. If you prefer better metadata efficiency and don't require stats in table properties, v2 is recommended.

 

Hope this helps. Louis.

View solution in original post

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee

@yshah this is a great question. Let me explain what's happening:

The Delta Lake table property `delta.checkpointPolicy=v2` changes how and where table statistics are stored and displayed when you run ANALYZE and DESCRIBE TABLE commands.

Classic vs V2 Checkpoint Policy

With `delta.checkpointPolicy=classic`:
Table stats are saved in the transaction log and shown as key-value pairs in table properties, which you can readily see using DESCRIBE TABLE—even on single-user clusters.

With `delta.checkpointPolicy=v2` enabled:
Stats are stored in optimized checkpoint files (such as manifests or sidecars), not as key-value pairs in table properties. As a result, DESCRIBE TABLE does not display these stats for tables with v2 checkpointing.

Why This Change Matters

The reason for this change is to boost performance and reduce metadata costs—especially important for streaming and high-frequency workloads. However, it also means some legacy behaviors and tools that expect stats in table properties will no longer see them unless you use the classic policy.

Recommendation

If you need stats to show up in table properties for use with legacy workflows or third-party tools, stick with `delta.checkpointPolicy=classic`. If you prefer better metadata efficiency and don't require stats in table properties, v2 is recommended.

 

Hope this helps. Louis.

yshah
New Contributor

What would the best approach be to access the table column statistics if checkpoint V2 is enabled?

BigRoux
Databricks Employee
Databricks Employee

Greetings @yshah , here are some helpful hints/tips/tricks to guide you.

To access table column statistics when checkpoint V2 is enabled, you can follow these guidelines:

  1. Utilize Databricks Runtime 13.3 LTS or Higher: Ensure that you are using Databricks Runtime 13.3 LTS or above, which supports reading and writing tables with v2 checkpoints.

  2. Leverage Automatic Liquid Clustering: With checkpoint V2, you are able to enable automatic liquid clustering for Unity Catalog managed Delta tables. Using the CLUSTER BY AUTO clause allows Databricks to intelligently choose clustering keys based on historical query workloads. This can help optimize performance while accessing column statistics effectively.

  3. Check for Collected Statistics: By default, the first 32 columns in a Delta table have statistics collected, which can be helpful when querying and utilizing those statistics for performance improvements.

  4. Querying Statistical Data: Use SQL commands like DESCRIBE TABLE table_name; or DESCRIBE DETAIL table_name; to retrieve metadata and statistics for each column. These commands work well in environments where v2 checkpoints are used.

  5. Implement Predictive Optimization: For efficient clustering and access to statistics, ensure that predictive optimization is enabled for the table. This allows the system to re-evaluate and update the clustering keys based on changing query patterns over time.

In summary, use the appropriate Databricks Runtime, enable automatic liquid clustering, ensure statistics are collected, and use relevant SQL commands to access column statistics when working with tables using checkpoint V2.

 

Hope this helps, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now