<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic delta table grouping by key which is not partitioned by is very slow in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10133#M5378</link>
    <description>&lt;P&gt;I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).&lt;/P&gt;&lt;P&gt;I often will group by the key (e.g. select max(m1) group by timestamp, key).&lt;/P&gt;&lt;P&gt;I cannot partition by `key` because there are too many values( ~200K).&lt;/P&gt;&lt;P&gt;I have tried to optimize the table with ZORDER on key, but it does not help.&lt;/P&gt;&lt;P&gt;Bottom line, simple queries take several minutes. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea what to do?&lt;/P&gt;</description>
    <pubDate>Fri, 03 Feb 2023 00:15:48 GMT</pubDate>
    <dc:creator>chanansh</dc:creator>
    <dc:date>2023-02-03T00:15:48Z</dc:date>
    <item>
      <title>delta table grouping by key which is not partitioned by is very slow</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10133#M5378</link>
      <description>&lt;P&gt;I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).&lt;/P&gt;&lt;P&gt;I often will group by the key (e.g. select max(m1) group by timestamp, key).&lt;/P&gt;&lt;P&gt;I cannot partition by `key` because there are too many values( ~200K).&lt;/P&gt;&lt;P&gt;I have tried to optimize the table with ZORDER on key, but it does not help.&lt;/P&gt;&lt;P&gt;Bottom line, simple queries take several minutes. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea what to do?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Feb 2023 00:15:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10133#M5378</guid>
      <dc:creator>chanansh</dc:creator>
      <dc:date>2023-02-03T00:15:48Z</dc:date>
    </item>
    <item>
      <title>Re: delta table grouping by key which is not partitioned by is very slow</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10134#M5379</link>
      <description>&lt;P&gt;@Hanan Shteingart​&amp;nbsp;: We suggest you to use Delta Lake's OPTIMIZE command. Delta Lake's OPTIMIZE command can help improve query performance by reorganizing the table's data files and statistics. The OPTIMIZE command can automatically determine the best way to organize the table's data files based on your query patterns, and can help eliminate data skew and improve data locality.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Use Delta Lake's CLUSTER BY command: If you are unable to partition your table on the key&lt;/P&gt;&lt;P&gt;column due to a large number of unique values, you can use Delta Lake's CLUSTER BY command instead.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2023 08:22:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10134#M5379</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-07T08:22:33Z</dc:date>
    </item>
    <item>
      <title>Re: delta table grouping by key which is not partitioned by is very slow</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10135#M5380</link>
      <description>Been there done that &lt;span class="lia-unicode-emoji" title=":smiling_face_with_smiling_eyes:"&gt;😊&lt;/span&gt; still super slow for anything interactive.</description>
      <pubDate>Mon, 13 Mar 2023 06:44:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10135#M5380</guid>
      <dc:creator>chanansh</dc:creator>
      <dc:date>2023-03-13T06:44:09Z</dc:date>
    </item>
    <item>
      <title>Re: delta table grouping by key which is not partitioned by is very slow</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10136#M5381</link>
      <description>&lt;P&gt;Hi @Hanan Shteingart​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Apr 2023 06:48:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-grouping-by-key-which-is-not-partitioned-by-is-very/m-p/10136#M5381</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-10T06:48:29Z</dc:date>
    </item>
  </channel>
</rss>

