<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Liquid Clustering in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering/m-p/99944#M8782</link>
    <description>&lt;P&gt;How can I use column for liquid clustering that is not in first 32 column of my delta table schema.&lt;/P&gt;</description>
    <pubDate>Mon, 25 Nov 2024 11:57:58 GMT</pubDate>
    <dc:creator>hrishiharsh25</dc:creator>
    <dc:date>2024-11-25T11:57:58Z</dc:date>
    <item>
      <title>Liquid Clustering</title>
      <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering/m-p/99944#M8782</link>
      <description>&lt;P&gt;How can I use column for liquid clustering that is not in first 32 column of my delta table schema.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2024 11:57:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/liquid-clustering/m-p/99944#M8782</guid>
      <dc:creator>hrishiharsh25</dc:creator>
      <dc:date>2024-11-25T11:57:58Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Clustering</title>
      <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering/m-p/99958#M8783</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reference internal" href="https://docs.databricks.com/en/delta/data-skipping.html#stats-cols" target="_blank"&gt;&lt;SPAN class="std std-ref"&gt;Specify Delta statistics columns&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We can use the below workaround for your use case:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1. Use the below table property to specify the column name that you want to use in the liquid clustering&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;delta.dataSkippingStatsColumns&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes&amp;nbsp;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;dataSkippingNumIndexedCols&lt;/SPAN&gt;&lt;/CODE&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Table properties can be set at table creation or with&amp;nbsp;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;ALTER&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class="pre"&gt;TABLE&lt;/SPAN&gt;&lt;/CODE&gt;&amp;nbsp;statements. See&amp;nbsp;&lt;A class="reference internal" href="https://docs.databricks.com/en/delta/table-properties.html" target="_blank"&gt;&lt;SPAN class="doc"&gt;Delta table properties reference&lt;/SPAN&gt;&lt;/A&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Then run the below query to collect the stats for the above column:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN class="k"&gt;ANALYZE&lt;/SPAN&gt; &lt;SPAN class="k"&gt;TABLE&lt;/SPAN&gt; &lt;SPAN class="k"&gt;table_name&lt;/SPAN&gt; &lt;SPAN class="n"&gt;COMPUTE&lt;/SPAN&gt; &lt;SPAN class="n"&gt;DELTA&lt;/SPAN&gt; &lt;SPAN class="k"&gt;STATISTICS&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;3. Use the column in the cluster By for liquid clustering&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2024 13:49:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/liquid-clustering/m-p/99958#M8783</guid>
      <dc:creator>PotnuruSiva</dc:creator>
      <dc:date>2024-11-25T13:49:55Z</dc:date>
    </item>
  </channel>
</rss>

