<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Liquid Clustering VS Z-ordering in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157984#M54643</link>
    <description>&lt;P&gt;I want to understand difference b/w&amp;nbsp;Liquid Clustering VS Z-ordering and also how both works?&lt;/P&gt;</description>
    <pubDate>Sun, 31 May 2026 08:06:44 GMT</pubDate>
    <dc:creator>Rupa0503</dc:creator>
    <dc:date>2026-05-31T08:06:44Z</dc:date>
    <item>
      <title>Liquid Clustering VS Z-ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157984#M54643</link>
      <description>&lt;P&gt;I want to understand difference b/w&amp;nbsp;Liquid Clustering VS Z-ordering and also how both works?&lt;/P&gt;</description>
      <pubDate>Sun, 31 May 2026 08:06:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157984#M54643</guid>
      <dc:creator>Rupa0503</dc:creator>
      <dc:date>2026-05-31T08:06:44Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Clustering VS Z-ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157985#M54644</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992"&gt;@Rupa0503&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;Liquid Clustering is basically the modern replacement for Z-ordering. Both are great for data skipping (faster reads), but Liquid fixes a lot of Z-order's headaches.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;How They Work (and why Liquid wins)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Z-Ordering:&lt;/STRONG&gt; It's rigid. When you add new data and run OPTIMIZE, it often has to rewrite a ton of your existing files to keep things sorted. It's slow and computationally expensive.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Liquid Clustering:&lt;/STRONG&gt; It's flexible and &lt;STRONG&gt;incremental&lt;/STRONG&gt;. When you optimize, Databricks only processes what it needs to. It's way faster to update, handles skewed data better, and lets you change clustering keys without rewriting the whole table.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;How to Use It / Migrate&lt;/STRONG&gt; Moving from Z-order to Liquid is super easy using ALTER TABLE:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Use Standard Liquid:&lt;/STRONG&gt; ALTER TABLE table CLUSTER BY (col1, col2) &lt;I&gt;(Just remember to run OPTIMIZE afterward!)&lt;/I&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Use Auto Liquid:&lt;/STRONG&gt; ALTER TABLE table CLUSTER BY AUTO &lt;I&gt;(Note: requires Predictive Optimization enabled)&lt;/I&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Turn it off:&lt;/STRONG&gt; ALTER TABLE table CLUSTER BY NONE&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;My Personal Benchmarks &amp;amp; Recommendation&lt;/STRONG&gt; I tested Z-order, Standard Liquid, and Auto Liquid with the exact same data and tables. Here is the verdict:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Reads:&lt;/STRONG&gt; All three perform about the same.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Writes/Optimization:&lt;/STRONG&gt; Auto Liquid is definitely the fastest.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Cost (My Pick):&lt;/STRONG&gt; I personally stick to &lt;STRONG&gt;Standard Liquid Clustering&lt;/STRONG&gt; to save money. Auto Liquid uses Predictive Optimization, which runs on Serverless compute and adds extra costs. Standard Liquid gives you all the incremental speed benefits over Z-order, but leaves you in control of your compute bill!&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 31 May 2026 09:39:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157985#M54644</guid>
      <dc:creator>ShamenParis</dc:creator>
      <dc:date>2026-05-31T09:39:00Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Clustering VS Z-ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157988#M54646</link>
      <description>&lt;P&gt;&lt;A href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229992" target="_blank"&gt;@Rupa0503&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Both are optimization approaches for Delta Lake query performance but differ in flexibility and maintenance.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Z-Ordering&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is an optimization approach that co locates related data across&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;multiple&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;columns within files based on the setup you create.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;You manually specify columns via&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;OPTIMIZE table ZORDER BY (col1, col2)&lt;/STRONG&gt;&amp;nbsp;&lt;SPAN class=""&gt;and run&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;OPTIMIZE&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;periodically to maintain layout as data grows. It's&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;ideal for stable&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;legacy&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;read heavy workloads with predictable filter patterns&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;During OPTIMIZE&lt;SPAN class=""&gt;, files are rewritten to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;interleave&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;values across specified dimensions improving multi column filter skipping.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;You can use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Z Ordering&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for legacy tables with stable low-cardinality filters&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Liquid Clustering&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;&amp;nbsp;is the modern &amp;amp; my recommended approach for new tables&lt;/SPAN&gt;&lt;SPAN class=""&gt;. It uses a tree-based algorithm to incrementally organize data by clustering keys&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;without&lt;/SPAN&gt;&lt;SPAN class=""&gt;&amp;nbsp;full rewrites.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Dynamic&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;: Change clustering keys anytime via&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;CLUSTER BY (cols)&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;without rewriting existing data&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Automatic &amp;amp; Incremental&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;: Supports&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;CLUSTER BY AUTO&lt;/STRONG&gt;&lt;SPAN class=""&gt;&amp;nbsp;to allow Databricks select optimal keys based on query history&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Handles complexity&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;: Better for high-cardinality columns, skewed data or evolving query pattern&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;Liquid Clustering&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for new tables with high-cardinality filters, concurrent writes or when query patterns evolve&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN class=""&gt;More details&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid" rel="noopener nofollow noreferrer" target="_blank"&gt;here&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 31 May 2026 10:54:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-vs-z-ordering/m-p/157988#M54646</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-05-31T10:54:29Z</dc:date>
    </item>
  </channel>
</rss>

