<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64528#M32595</link>
    <description>&lt;P&gt;it seems that for this specific query Liquid Clustering has worse performance.&amp;nbsp; It does not have better performance for all queries.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The following are examples of scenarios that benefit from clustering:&lt;/EM&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables often filtered by high cardinality columns.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with significant skew in data distribution.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables that grow quickly and require maintenance and tuning effort.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with concurrent write requirements.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with access patterns that change over time.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables where a typical partition key could leave the table with too many or too few partitions.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Mon, 25 Mar 2024 15:15:12 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2024-03-25T15:15:12Z</dc:date>
    <item>
      <title>OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64180#M32483</link>
      <description>&lt;P&gt;I created 15 Million records as a Delta Table and i'm running a simple filter query on that table based on one column value - which will return only one record. Because all the values on that column are unique.&lt;/P&gt;&lt;P&gt;Delta Table is not partitioned.&lt;/P&gt;&lt;P&gt;Before enabling Liquid clustering/OPTIMIZE, the query response time was less than a second.&lt;/P&gt;&lt;P&gt;After enabling Liquid Clustering/OPTIMIZE, the query takes 3 to 4 seconds.&lt;/P&gt;&lt;P&gt;If i just enabled Liquid Clustering without OPTIMIZE, then the query response time is less than a second.&lt;/P&gt;&lt;P&gt;What is going on here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Mar 2024 11:07:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64180#M32483</guid>
      <dc:creator>SankaraiahNaray</dc:creator>
      <dc:date>2024-03-20T11:07:18Z</dc:date>
    </item>
    <item>
      <title>Re: OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64258#M32507</link>
      <description>&lt;P&gt;is the column you query clustered by LQ or not? That could be the explanation.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 09:04:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64258#M32507</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-03-21T09:04:39Z</dc:date>
    </item>
    <item>
      <title>Re: OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64274#M32513</link>
      <description>&lt;P&gt;Yes the column is used as Clustering Key&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 10:50:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64274#M32513</guid>
      <dc:creator>SankaraiahNaray</dc:creator>
      <dc:date>2024-03-21T10:50:48Z</dc:date>
    </item>
    <item>
      <title>Re: OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64528#M32595</link>
      <description>&lt;P&gt;it seems that for this specific query Liquid Clustering has worse performance.&amp;nbsp; It does not have better performance for all queries.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The following are examples of scenarios that benefit from clustering:&lt;/EM&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables often filtered by high cardinality columns.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with significant skew in data distribution.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables that grow quickly and require maintenance and tuning effort.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with concurrent write requirements.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables with access patterns that change over time.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables where a typical partition key could leave the table with too many or too few partitions.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 25 Mar 2024 15:15:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64528#M32595</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-03-25T15:15:12Z</dc:date>
    </item>
    <item>
      <title>Re: OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64530#M32597</link>
      <description>&lt;P&gt;I'm testing a scenario mentioned in the document ( nothing complex)&lt;/P&gt;&lt;P&gt;The filter i'm using is High Cardinality column(every record is unique) and my table is not partitioned (so this is straight forward scenario)&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P&gt;&lt;EM&gt;Tables often filtered by high cardinality columns.&lt;/EM&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 25 Mar 2024 15:21:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-with-liquid-clustering-makes-filter-slower-than-without/m-p/64530#M32597</guid>
      <dc:creator>SankaraiahNaray</dc:creator>
      <dc:date>2024-03-25T15:21:23Z</dc:date>
    </item>
  </channel>
</rss>

