<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Liquid clustering not improved performance in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-not-improved-performance/m-p/127346#M47929</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175916"&gt;@SusmithaBadam&lt;/a&gt;, based on your use case, partitioned tables are performing better because they work kind of like labeled folders. When you group by, it can quickly go to the exact folder instead of scanning everything, so it’s much faster.&lt;/P&gt;&lt;P&gt;Liquid clustering, on the other hand, shines when you need to filter on other detailed (high-cardinality) columns, but for your group-by queries on the partition columns, it can’t take that shortcut. So for your current setup, sticking with partitioned tables makes more sense performance-wise.&lt;/P&gt;</description>
    <pubDate>Mon, 04 Aug 2025 14:36:25 GMT</pubDate>
    <dc:creator>Renu_</dc:creator>
    <dc:date>2025-08-04T14:36:25Z</dc:date>
    <item>
      <title>Liquid clustering not improved performance</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-not-improved-performance/m-p/127308#M47919</link>
      <description>&lt;P&gt;Hi There,&lt;/P&gt;&lt;P&gt;I have a table of 160 GB with partition applied on country and yearmonth columns, I maintain a previous history of 6 years and replace the partitions (latest 2 months) to add the new data.&lt;/P&gt;&lt;P&gt;I use overwrite mode to replace the effected partitions. The entire ETL process executes without any failure but with heavy skewness in data partitions. I did a POC with liquid clustering by reducing table size to 45GB, but did not see much improvement.&lt;/P&gt;&lt;P&gt;Observation:&lt;/P&gt;&lt;P&gt;Select with group by on the cluster table with Optimize takes 39sec where as the partitioned table takes 2 sec. Could see a better write but read performance is much degraded.&lt;/P&gt;&lt;P&gt;I have attached an excel with read/write performance difference. I want to utilize the liquid clustering advantages but no luck.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Aug 2025 09:27:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-not-improved-performance/m-p/127308#M47919</guid>
      <dc:creator>SusmithaBadam</dc:creator>
      <dc:date>2025-08-04T09:27:20Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid clustering not improved performance</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-not-improved-performance/m-p/127346#M47929</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175916"&gt;@SusmithaBadam&lt;/a&gt;, based on your use case, partitioned tables are performing better because they work kind of like labeled folders. When you group by, it can quickly go to the exact folder instead of scanning everything, so it’s much faster.&lt;/P&gt;&lt;P&gt;Liquid clustering, on the other hand, shines when you need to filter on other detailed (high-cardinality) columns, but for your group-by queries on the partition columns, it can’t take that shortcut. So for your current setup, sticking with partitioned tables makes more sense performance-wise.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Aug 2025 14:36:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-not-improved-performance/m-p/127346#M47929</guid>
      <dc:creator>Renu_</dc:creator>
      <dc:date>2025-08-04T14:36:25Z</dc:date>
    </item>
  </channel>
</rss>

