<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Behaviour of ANALYZE command varying when using different clusters and table types. in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133275#M2269</link>
    <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/186516"&gt;@yshah&lt;/a&gt;&amp;nbsp;, here are some helpful hints/tips/tricks to guide you.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;To access table column statistics when checkpoint V2 is enabled, you can follow these guidelines:&lt;/P&gt;
&lt;OL class="qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Utilize Databricks Runtime 13.3 LTS or Higher&lt;/STRONG&gt;: Ensure that you are using Databricks Runtime 13.3 LTS or above, which supports reading and writing tables with v2 checkpoints.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Leverage Automatic Liquid Clustering&lt;/STRONG&gt;: With checkpoint V2, you are able to enable automatic liquid clustering for Unity Catalog managed Delta tables. Using the &lt;CODE class="qt3gz9f"&gt;CLUSTER BY AUTO&lt;/CODE&gt; clause allows Databricks to intelligently choose clustering keys based on historical query workloads. This can help optimize performance while accessing column statistics effectively.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Check for Collected Statistics&lt;/STRONG&gt;: By default, the first 32 columns in a Delta table have statistics collected, which can be helpful when querying and utilizing those statistics for performance improvements.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Querying Statistical Data&lt;/STRONG&gt;: Use SQL commands like &lt;CODE class="qt3gz9f"&gt;DESCRIBE TABLE table_name;&lt;/CODE&gt; or &lt;CODE class="qt3gz9f"&gt;DESCRIBE DETAIL table_name;&lt;/CODE&gt; to retrieve metadata and statistics for each column. These commands work well in environments where v2 checkpoints are used.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Implement Predictive Optimization&lt;/STRONG&gt;: For efficient clustering and access to statistics, ensure that predictive optimization is enabled for the table. This allows the system to re-evaluate and update the clustering keys based on changing query patterns over time.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="qt3gz91 paragraph"&gt;In summary, use the appropriate Databricks Runtime, enable automatic liquid clustering, ensure statistics are collected, and use relevant SQL commands to access column statistics when working with tables using checkpoint V2.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Hope this helps, Louis.&lt;/P&gt;</description>
    <pubDate>Mon, 29 Sep 2025 14:40:29 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2025-09-29T14:40:29Z</dc:date>
    <item>
      <title>Behaviour of ANALYZE command varying when using different clusters and table types.</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/132987#M2259</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Certain tables have this configuration enabled, whereas others do not have it.&lt;BR /&gt;Delta.checkpointPolicy=v2&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;This is affecting the behavior of the ANALYZE command&lt;/P&gt;&lt;P&gt;If flag is enabled : Table stats are not visible after doing the DESCRIBE command using SINGLE user cluster&lt;BR /&gt;If flag is disabled : Table stats visible in standard "table properties" as key-value pairs&amp;nbsp;using SINGLE user cluster&lt;BR /&gt;&lt;BR /&gt;Please help us understand why this change in behavior exists for different cluster types and for tables with this flag&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Sep 2025 04:13:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/132987#M2259</guid>
      <dc:creator>yshah</dc:creator>
      <dc:date>2025-09-25T04:13:27Z</dc:date>
    </item>
    <item>
      <title>Re: Behaviour of ANALYZE command varying when using different clusters and table types.</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133098#M2266</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/186516"&gt;@yshah&lt;/a&gt; this is a great question. Let me explain what's happening:&lt;/P&gt;
&lt;P&gt;The Delta Lake table property `delta.checkpointPolicy=v2` changes how and where table statistics are stored and displayed when you run ANALYZE and DESCRIBE TABLE commands.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Classic vs V2 Checkpoint Policy&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;With `delta.checkpointPolicy=classic`:&lt;/STRONG&gt;&lt;BR /&gt;Table stats are saved in the transaction log and shown as key-value pairs in table properties, which you can readily see using DESCRIBE TABLE—even on single-user clusters.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;With `delta.checkpointPolicy=v2` enabled:&lt;/STRONG&gt;&lt;BR /&gt;Stats are stored in optimized checkpoint files (such as manifests or sidecars), not as key-value pairs in table properties. As a result, DESCRIBE TABLE does not display these stats for tables with v2 checkpointing.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why This Change Matters&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The reason for this change is to boost performance and reduce metadata costs—especially important for streaming and high-frequency workloads. However, it also means some legacy behaviors and tools that expect stats in table properties will no longer see them unless you use the classic policy.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Recommendation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;If you need stats to show up in table properties for use with legacy workflows or third-party tools, stick with `delta.checkpointPolicy=classic`. If you prefer better metadata efficiency and don't require stats in table properties, v2 is recommended.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps. Louis.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Sep 2025 16:38:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133098#M2266</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-09-26T16:38:45Z</dc:date>
    </item>
    <item>
      <title>Re: Behaviour of ANALYZE command varying when using different clusters and table types.</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133213#M2267</link>
      <description>&lt;P&gt;What would the best approach be to access the table column statistics if checkpoint V2 is enabled?&lt;/P&gt;</description>
      <pubDate>Mon, 29 Sep 2025 05:59:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133213#M2267</guid>
      <dc:creator>yshah</dc:creator>
      <dc:date>2025-09-29T05:59:57Z</dc:date>
    </item>
    <item>
      <title>Re: Behaviour of ANALYZE command varying when using different clusters and table types.</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133275#M2269</link>
      <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/186516"&gt;@yshah&lt;/a&gt;&amp;nbsp;, here are some helpful hints/tips/tricks to guide you.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;To access table column statistics when checkpoint V2 is enabled, you can follow these guidelines:&lt;/P&gt;
&lt;OL class="qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Utilize Databricks Runtime 13.3 LTS or Higher&lt;/STRONG&gt;: Ensure that you are using Databricks Runtime 13.3 LTS or above, which supports reading and writing tables with v2 checkpoints.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Leverage Automatic Liquid Clustering&lt;/STRONG&gt;: With checkpoint V2, you are able to enable automatic liquid clustering for Unity Catalog managed Delta tables. Using the &lt;CODE class="qt3gz9f"&gt;CLUSTER BY AUTO&lt;/CODE&gt; clause allows Databricks to intelligently choose clustering keys based on historical query workloads. This can help optimize performance while accessing column statistics effectively.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Check for Collected Statistics&lt;/STRONG&gt;: By default, the first 32 columns in a Delta table have statistics collected, which can be helpful when querying and utilizing those statistics for performance improvements.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Querying Statistical Data&lt;/STRONG&gt;: Use SQL commands like &lt;CODE class="qt3gz9f"&gt;DESCRIBE TABLE table_name;&lt;/CODE&gt; or &lt;CODE class="qt3gz9f"&gt;DESCRIBE DETAIL table_name;&lt;/CODE&gt; to retrieve metadata and statistics for each column. These commands work well in environments where v2 checkpoints are used.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;Implement Predictive Optimization&lt;/STRONG&gt;: For efficient clustering and access to statistics, ensure that predictive optimization is enabled for the table. This allows the system to re-evaluate and update the clustering keys based on changing query patterns over time.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="qt3gz91 paragraph"&gt;In summary, use the appropriate Databricks Runtime, enable automatic liquid clustering, ensure statistics are collected, and use relevant SQL commands to access column statistics when working with tables using checkpoint V2.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Hope this helps, Louis.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Sep 2025 14:40:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/behaviour-of-analyze-command-varying-when-using-different/m-p/133275#M2269</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-09-29T14:40:29Z</dc:date>
    </item>
  </channel>
</rss>

