<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Seeking Insights on Liquid Clustering (LC) Based on Table Sizes in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138766#M51007</link>
    <description>&lt;P&gt;Liquid Clustering replaces manual partitioning and Z-Ordering with &lt;STRONG&gt;adaptive file clustering&lt;/STRONG&gt;.&lt;BR /&gt;It keeps your data physically organized for faster queries and merges, without forcing you to manage partition columns or compaction jobs.&lt;/P&gt;&lt;P&gt;It’s powered by &lt;STRONG&gt;cluster-by keys&lt;/STRONG&gt;, Delta’s internal &lt;STRONG&gt;clustering metadata&lt;/STRONG&gt;, and &lt;STRONG&gt;automatic reclustering&lt;/STRONG&gt; handled by the Delta optimizer.&lt;/P&gt;&lt;P&gt;Table Size Rough Range LC Benefit Notes&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Small&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&amp;lt; 10 GB or &amp;lt; 50 million rows&lt;/TD&gt;&lt;TD&gt;&lt;EM&gt;Limited&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Metadata overhead may outweigh benefit. Stick with Delta defaults or small Z-ORDER.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Medium&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;10 GB – 1 TB or 50M–1B rows&lt;/TD&gt;&lt;TD&gt;&lt;EM&gt;Strong&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Ideal range — LC improves scan times, merges, and compaction efficiency.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Large&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&amp;gt; 1 TB or billions of rows&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;EM&gt;Very high&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Major gains in data skipping and read performance, especially for multi-year or multi-tenant data.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;More you can find in the documentation. If you have a specific case, not generic, I am more than happy to advise.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/clustering" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/clustering&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/best-practices" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/best-practices&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/optimize" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/optimize&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 12 Nov 2025 11:46:50 GMT</pubDate>
    <dc:creator>bianca_unifeye</dc:creator>
    <dc:date>2025-11-12T11:46:50Z</dc:date>
    <item>
      <title>Seeking Insights on Liquid Clustering (LC) Based on Table Sizes</title>
      <link>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138764#M51005</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.&lt;BR /&gt;Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size ranges for each category.&lt;/P&gt;&lt;P&gt;Any recommendations or best practices for applying LC across different table sizes would be appreciated!&lt;BR /&gt;Looking forward to hearing about your experiences and insights on whether LC should be adopted at various data scales and what the tangible benefits are.&lt;/P&gt;&lt;P&gt;Thanks in advance for sharing your knowledge!&lt;/P&gt;</description>
      <pubDate>Wed, 12 Nov 2025 11:00:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138764#M51005</guid>
      <dc:creator>pooja_bhumandla</dc:creator>
      <dc:date>2025-11-12T11:00:35Z</dc:date>
    </item>
    <item>
      <title>Re: Seeking Insights on Liquid Clustering (LC) Based on Table Sizes</title>
      <link>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138766#M51007</link>
      <description>&lt;P&gt;Liquid Clustering replaces manual partitioning and Z-Ordering with &lt;STRONG&gt;adaptive file clustering&lt;/STRONG&gt;.&lt;BR /&gt;It keeps your data physically organized for faster queries and merges, without forcing you to manage partition columns or compaction jobs.&lt;/P&gt;&lt;P&gt;It’s powered by &lt;STRONG&gt;cluster-by keys&lt;/STRONG&gt;, Delta’s internal &lt;STRONG&gt;clustering metadata&lt;/STRONG&gt;, and &lt;STRONG&gt;automatic reclustering&lt;/STRONG&gt; handled by the Delta optimizer.&lt;/P&gt;&lt;P&gt;Table Size Rough Range LC Benefit Notes&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Small&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&amp;lt; 10 GB or &amp;lt; 50 million rows&lt;/TD&gt;&lt;TD&gt;&lt;EM&gt;Limited&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Metadata overhead may outweigh benefit. Stick with Delta defaults or small Z-ORDER.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Medium&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;10 GB – 1 TB or 50M–1B rows&lt;/TD&gt;&lt;TD&gt;&lt;EM&gt;Strong&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Ideal range — LC improves scan times, merges, and compaction efficiency.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Large&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&amp;gt; 1 TB or billions of rows&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;EM&gt;Very high&lt;/EM&gt;&lt;/TD&gt;&lt;TD&gt;Major gains in data skipping and read performance, especially for multi-year or multi-tenant data.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;More you can find in the documentation. If you have a specific case, not generic, I am more than happy to advise.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/clustering" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/clustering&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/best-practices" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/best-practices&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/optimize" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/delta/optimize&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Nov 2025 11:46:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138766#M51007</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2025-11-12T11:46:50Z</dc:date>
    </item>
    <item>
      <title>Re: Seeking Insights on Liquid Clustering (LC) Based on Table Sizes</title>
      <link>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138787#M51012</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193092"&gt;@bianca_unifeye&lt;/a&gt;&amp;nbsp;, thank you for your response.&lt;/P&gt;&lt;P&gt;My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks in advance for sharing your knowledge!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Nov 2025 13:53:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/138787#M51012</guid>
      <dc:creator>pooja_bhumandla</dc:creator>
      <dc:date>2025-11-12T13:53:24Z</dc:date>
    </item>
    <item>
      <title>Re: Seeking Insights on Liquid Clustering (LC) Based on Table Sizes</title>
      <link>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/139046#M51075</link>
      <description>&lt;P&gt;For tables ranging from &lt;STRONG&gt;1 KB → 5 TB&lt;/STRONG&gt;, you’ll usually end up with a mixed strategy. LC is not “all or nothing”; it shines when the physical size + update pattern justify the clustering overhead.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Use Liquid Clustering when:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;clustering keys have natural selectivity (e.g., customer_id, timestamp)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;MERGE/DELETE/UPDATE operations happen regularly&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;the table grows continuously&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;multiple teams access different slices of the data&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;you want predictable performance without manual tuning&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Avoid Liquid Clustering when:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;data is tiny&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;table rarely changes&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;workload is sequential full scans&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;cluster-by keys have low cardinality&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 14 Nov 2025 10:08:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeking-insights-on-liquid-clustering-lc-based-on-table-sizes/m-p/139046#M51075</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2025-11-14T10:08:54Z</dc:date>
    </item>
  </channel>
</rss>

