<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Liquid Clustering Key Change Question in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/112318#M9177</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152093"&gt;@ShivangiB&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;You're correct in your understanding. When you change a clustering key using ALTER TABLE followed by OPTIMIZE, it doesn't automatically recluster existing data. Let me explain why this happens and what options you have.In Delta Lake (which Databricks uses), the ALTER TABLE CLUSTER BY statement only changes the clustering specification for future write operations. The OPTIMIZE command without additional parameters will only compact small files but won't reorder data according to the new clustering key.For your existing data to benefit from the new clustering key (key2), you need to explicitly request reclustering. You have two options:&lt;/P&gt;&lt;OL class=""&gt;&lt;LI&gt;Run OPTIMIZE with ZORDER BY:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;SPAN class=""&gt;OPTIMIZE&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt; ZORDER &lt;/SPAN&gt;&lt;SPAN class=""&gt;BY&lt;/SPAN&gt; &lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;key2&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Or rewrite the table entirely:&amp;nbsp;&amp;nbsp;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CREATE&lt;/SPAN&gt; &lt;SPAN class=""&gt;OR&lt;/SPAN&gt; &lt;SPAN class=""&gt;REPLACE&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; table_temp &lt;/SPAN&gt;&lt;SPAN class=""&gt;CLUSTER &lt;SPAN class=""&gt;BY&lt;/SPAN&gt; &lt;SPAN class=""&gt;(&lt;/SPAN&gt;key2&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class=""&gt;*&lt;/SPAN&gt; &lt;SPAN class=""&gt;FROM&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt;;&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;-- Then rename&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tables&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ALTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt; &lt;SPAN class=""&gt;RENAME&lt;/SPAN&gt; &lt;SPAN class=""&gt;TO&lt;/SPAN&gt; table_old&lt;SPAN class=""&gt;;&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;ALTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; table_temp &lt;SPAN class=""&gt;RENAME&lt;/SPAN&gt; &lt;SPAN class=""&gt;TO&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt;;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;The reason your test showed that existing files weren't rewritten is because OPTIMIZE alone primarily focuses on file compaction rather than data reorganization. For large historical datasets, you need to explicitly trigger the reclustering process to see performance improvements for queries filtering on key2.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 12 Mar 2025 02:58:25 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-03-12T02:58:25Z</dc:date>
    <item>
      <title>Liquid Clustering Key Change Question</title>
      <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/111833#M9175</link>
      <description>&lt;P&gt;If i already have a cluster key1 for existing table, i want to change cluster key to key2 using&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;SPAN&gt;ALTER&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;TABLE&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;table CLUSTER&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;BY&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;(key2), then run&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;OPTIMIZE table&lt;/SPAN&gt;&lt;SPAN&gt;, based on databrick document , existing files will not be rewritten (verified by my test as well), cluster Key2 will only be applied incrementally.&amp;nbsp; If my existing files are huge (saying having long history), using Key2 will not improve performance when i search the table (e.g. select * from table where key2='xxxx'). is liquid cluster supposed to work this way or i am missing anything?&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 05 Mar 2025 15:22:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/111833#M9175</guid>
      <dc:creator>ShivangiB</dc:creator>
      <dc:date>2025-03-05T15:22:54Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Clustering Key Change Question</title>
      <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/112315#M9176</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152093"&gt;@ShivangiB&lt;/a&gt;&amp;nbsp;Yes, Liquid cluster is supposed to work that way. The behavior that only applies new keys to the new data supports the&amp;nbsp;characteristic of "Tables with access patterns that change over time". It allows us to change clustering keys based on query pattern changes without rewriting the historical data.&lt;/P&gt;
&lt;P&gt;If the historical data needs to use the new clustering keys, we can use &lt;A href="https://docs.databricks.com/aws/en/delta/clustering#force-reclustering-for-all-records" target="_blank"&gt;OPTIMIZE FULL&lt;/A&gt;, which requires I/O costs and time.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 02:38:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/112315#M9176</guid>
      <dc:creator>koji_kawamura</dc:creator>
      <dc:date>2025-03-12T02:38:32Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Clustering Key Change Question</title>
      <link>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/112318#M9177</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152093"&gt;@ShivangiB&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;You're correct in your understanding. When you change a clustering key using ALTER TABLE followed by OPTIMIZE, it doesn't automatically recluster existing data. Let me explain why this happens and what options you have.In Delta Lake (which Databricks uses), the ALTER TABLE CLUSTER BY statement only changes the clustering specification for future write operations. The OPTIMIZE command without additional parameters will only compact small files but won't reorder data according to the new clustering key.For your existing data to benefit from the new clustering key (key2), you need to explicitly request reclustering. You have two options:&lt;/P&gt;&lt;OL class=""&gt;&lt;LI&gt;Run OPTIMIZE with ZORDER BY:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;SPAN class=""&gt;OPTIMIZE&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt; ZORDER &lt;/SPAN&gt;&lt;SPAN class=""&gt;BY&lt;/SPAN&gt; &lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;key2&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Or rewrite the table entirely:&amp;nbsp;&amp;nbsp;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CREATE&lt;/SPAN&gt; &lt;SPAN class=""&gt;OR&lt;/SPAN&gt; &lt;SPAN class=""&gt;REPLACE&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; table_temp &lt;/SPAN&gt;&lt;SPAN class=""&gt;CLUSTER &lt;SPAN class=""&gt;BY&lt;/SPAN&gt; &lt;SPAN class=""&gt;(&lt;/SPAN&gt;key2&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class=""&gt;*&lt;/SPAN&gt; &lt;SPAN class=""&gt;FROM&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt;;&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;-- Then rename&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; tables&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ALTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt; &lt;SPAN class=""&gt;RENAME&lt;/SPAN&gt; &lt;SPAN class=""&gt;TO&lt;/SPAN&gt; table_old&lt;SPAN class=""&gt;;&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;ALTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;TABLE&lt;/SPAN&gt; table_temp &lt;SPAN class=""&gt;RENAME&lt;/SPAN&gt; &lt;SPAN class=""&gt;TO&lt;/SPAN&gt; &lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt;;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;The reason your test showed that existing files weren't rewritten is because OPTIMIZE alone primarily focuses on file compaction rather than data reorganization. For large historical datasets, you need to explicitly trigger the reclustering process to see performance improvements for queries filtering on key2.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Mar 2025 02:58:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/liquid-clustering-key-change-question/m-p/112318#M9177</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-03-12T02:58:25Z</dc:date>
    </item>
  </channel>
</rss>

