<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/75115#M34872</link>
    <description>&lt;P&gt;Ah then your table had to have its statistics refilled, glad it works now.&amp;nbsp;&lt;BR /&gt;As for string types, it should work just as well.&lt;BR /&gt;"slow" is a bit subjective maybe. You have not yet mentioned the warehouse tier/cluster config, are you using sufficient processing power?&lt;/P&gt;</description>
    <pubDate>Thu, 20 Jun 2024 06:31:23 GMT</pubDate>
    <dc:creator>jacovangelder</dc:creator>
    <dc:date>2024-06-20T06:31:23Z</dc:date>
    <item>
      <title>Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimization?</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74012#M34698</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a large Delta table with a size of 29TB. I implemented Liquid Clustering on this table, but running a simple MIN MAX query on the set cluster column is still extremely slow. I have already optimized the table. Am I missing something in my implementation?&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2024 11:17:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74012#M34698</guid>
      <dc:creator>laudhon</dc:creator>
      <dc:date>2024-06-14T11:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74013#M34699</link>
      <description>&lt;P&gt;What is the data type of the field you're querying?&lt;BR /&gt;All I can see is the name "_PartitionColumnUTC_". Judging by the name it is a date/timestamp but this is me making assumptions.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2024 11:59:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74013#M34699</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-14T11:59:47Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74035#M34706</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;this operation should take seconds because it use the precomputed statistics for the table. Then few elements to verify:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;if the data type is datetime or integer should work, if it is string data type then it needs to read all data.&lt;/LI&gt;&lt;LI&gt;verify the column position, normally delta lake only create statistics for the some columns (i think first 15), if the column is not at the list of column to precompute stats then&amp;nbsp;it needs to read all data.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;if you need to read all data, then i saw in the image that only 4 task are running means 4 cores, then i would recommend to use a bigger cluster&amp;nbsp; in memory and cores (scale up) with fewer nodes to reduce the shuffle.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2024 14:15:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74035#M34706</guid>
      <dc:creator>LuisRSanchez</dc:creator>
      <dc:date>2024-06-14T14:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74110#M34731</link>
      <description>&lt;P&gt;Hello, this is a type integer in the format YYYYMMDD&lt;/P&gt;</description>
      <pubDate>Sat, 15 Jun 2024 06:22:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74110#M34731</guid>
      <dc:creator>laudhon</dc:creator>
      <dc:date>2024-06-15T06:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74287#M34736</link>
      <description>&lt;P&gt;That is strange, min/max of integers should be able to be retrieved very quickly, especially if they are partitioned columns. You are 100% sure it is an integer column and not a string? You didn't specify any filter clauses in your queries that would potentially trigger a full table scan?&lt;/P&gt;</description>
      <pubDate>Sat, 15 Jun 2024 18:45:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/74287#M34736</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-15T18:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/75110#M34870</link>
      <description>&lt;P&gt;Hi Jaco,&lt;/P&gt;&lt;P&gt;Using the ANALYZE TABLE command fixed the issue; however, I am still experiencing very slow queries on the STRING type of a different cluster key. Does liquid clustering not support the STRING type very well?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 06:20:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/75110#M34870</guid>
      <dc:creator>laudhon</dc:creator>
      <dc:date>2024-06-20T06:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: Why is My MIN MAX Query Still Slow on a 29TB Delta Table After Liquid Clustering and Optimizatio</title>
      <link>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/75115#M34872</link>
      <description>&lt;P&gt;Ah then your table had to have its statistics refilled, glad it works now.&amp;nbsp;&lt;BR /&gt;As for string types, it should work just as well.&lt;BR /&gt;"slow" is a bit subjective maybe. You have not yet mentioned the warehouse tier/cluster config, are you using sufficient processing power?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 06:31:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-is-my-min-max-query-still-slow-on-a-29tb-delta-table-after/m-p/75115#M34872</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-20T06:31:23Z</dc:date>
    </item>
  </channel>
</rss>

