<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is it possible to control the ordering of the array values created by array_agg()? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65961#M32971</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103324"&gt;@ThomazRossito&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a great idea.&amp;nbsp;It can solve my problem.Thank you.&lt;/P&gt;</description>
    <pubDate>Wed, 10 Apr 2024 00:25:45 GMT</pubDate>
    <dc:creator>akisugi</dc:creator>
    <dc:date>2024-04-10T00:25:45Z</dc:date>
    <item>
      <title>Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65701#M32897</link>
      <description>&lt;P&gt;Hi!&amp;nbsp;I would be glad to ask you some questions.&lt;/P&gt;&lt;P&gt;I have the following data. &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="スクリーンショット 2024-04-06 23.08.15.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6950i2D556C183691D6B1/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="スクリーンショット 2024-04-06 23.08.15.png" alt="スクリーンショット 2024-04-06 23.08.15.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I would like to get this kind of result. I want `move` to correspond to the order of `hist`.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="スクリーンショット 2024-04-06 23.07.34.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6949i8DA6FDDC5C6DC020/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="スクリーンショット 2024-04-06 23.07.34.png" alt="スクリーンショット 2024-04-06 23.07.34.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Therefore, i considered the following query.&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;with&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;tmp &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;select&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;SPAN&gt;values&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'a'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'c'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'b'&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'a'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'b'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;null&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'b'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'a'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;null&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;4&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'a'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; tab(custid, hist, alf)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;order&lt;/SPAN&gt; &lt;SPAN&gt;by&lt;/SPAN&gt;&lt;SPAN&gt; custid &lt;/SPAN&gt;&lt;SPAN&gt;asc&lt;/SPAN&gt;&lt;SPAN&gt;, hist &lt;/SPAN&gt;&lt;SPAN&gt;asc&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;select&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;custid, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;array_join(array_agg(&lt;/SPAN&gt;&lt;SPAN&gt;alf), &lt;/SPAN&gt;&lt;SPAN&gt;' -&amp;gt; '&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; move&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;tmp &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;group&lt;/SPAN&gt; &lt;SPAN&gt;by&lt;/SPAN&gt;&lt;SPAN&gt; custid&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;order&lt;/SPAN&gt; &lt;SPAN&gt;by&lt;/SPAN&gt;&lt;SPAN&gt; custid &lt;/SPAN&gt;&lt;SPAN&gt;asc&lt;/SPAN&gt;&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;```&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Tried sorting by `order by` before combining strings into an array with `array_join(array_agg())`. At first glance it seems to work, but the official documentation for `array_agg()` states the following&lt;/P&gt;&lt;P&gt;```&lt;BR /&gt;The order of elements in the array is non-deterministic.&lt;/P&gt;&lt;P&gt;From | &lt;A href="https://docs.databricks.com/en/sql/language-manual/functions/array_agg.html#:~:text=The%20order%20of%20elements%20in%20the%20array%20is%20non%2Ddeterministic.%20NULL%20values%20are%20excluded" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/sql/language-manual/functions/array_agg.html#:~:text=The%20order%20of%20elements%20in%20the%20array%20is%20non%2Ddeterministic.%20NULL%20values%20are%20excluded&lt;/A&gt;.&lt;BR /&gt;```&lt;/P&gt;&lt;P&gt;Does it make sense to sort by `order by` before running `array_join(array_agg())`? Will it work as expected with larger data sizes?&lt;/P&gt;&lt;P&gt;I apologize for the inconvenience and would appreciate your response.&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 06 Apr 2024 14:12:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65701#M32897</guid>
      <dc:creator>akisugi</dc:creator>
      <dc:date>2024-04-06T14:12:37Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65729#M32904</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I believe that for your scenario the "sort_array", function can help&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/sql/language-manual/functions/sort_array.html" target="_blank"&gt;https://docs.databricks.com/en/sql/language-manual/functions/sort_array.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ThomazRossito_0-1712504147539.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6955i300B36EF5A0B77D2/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="ThomazRossito_0-1712504147539.png" alt="ThomazRossito_0-1712504147539.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 07 Apr 2024 15:38:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65729#M32904</guid>
      <dc:creator>ThomazRossito</dc:creator>
      <dc:date>2024-04-07T15:38:03Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65885#M32948</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103324"&gt;@ThomazRossito&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your response and&amp;nbsp;Sorry for my late reply.&lt;/P&gt;&lt;P&gt;I apologize that my question did not accurately describe my scenario.&lt;/P&gt;&lt;P&gt;It is true that the `sort_array` function works well for this sample data.&lt;/P&gt;&lt;P&gt;On the other hand, array values do not necessarily have rules like alphabetical or numeric.&lt;/P&gt;&lt;P&gt;------------------------------&lt;/P&gt;&lt;P&gt;with tmp as (&lt;BR /&gt;select * from (values&lt;BR /&gt;(1, 1, 'cherry'),&lt;BR /&gt;(1, 3, 'strawberries'),&lt;BR /&gt;(1, 2, 'acerola'),&lt;BR /&gt;(1, 5, 'banan'),&lt;BR /&gt;(1, 4, 'lemon')&lt;BR /&gt;) as tab(custid, hist, alf)&lt;BR /&gt;order by custid asc, hist asc&lt;BR /&gt;)&lt;BR /&gt;select&lt;BR /&gt;custid,&lt;BR /&gt;array_join(array_agg(alf), ' -&amp;gt; ') as move&lt;BR /&gt;from tmp&lt;BR /&gt;group by custid&lt;BR /&gt;order by custid asc;&lt;/P&gt;&lt;P&gt;------------------------------&lt;/P&gt;&lt;P&gt;I&amp;nbsp;would like to get array like `[cherry -&amp;gt; acerola -&amp;gt; strawberries -&amp;gt; lemon -&amp;gt; banan]` for above sample data. I would like to sort and keep the array values according to the order of the `hist` columns but&amp;nbsp;`array_agg()` states&amp;nbsp;`The order of elements in the array is non-deterministic`.&lt;/P&gt;&lt;P&gt;If you have a solution, I would appreciate your response!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 12:14:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65885#M32948</guid>
      <dc:creator>akisugi</dc:creator>
      <dc:date>2024-04-09T12:14:47Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65931#M32960</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Below is a solution&lt;BR /&gt;regexp_replace(array_join(sort_array(array_agg(concat(hist, alf))), ' -&amp;gt; '),'[0-9]','')&lt;/P&gt;&lt;P&gt;The idea in the code is to concatenate the "hist" column with the "alf" column, this way the "sort_array" can sort based on the numbers in the "hist" column and finally a "regexp_replace" is done removing the numbers from the "hist" column. hist" as they cannot be shown in the final result&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ThomazRossito_0-1712704018151.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6988i67902A91F73BAA9F/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="ThomazRossito_0-1712704018151.png" alt="ThomazRossito_0-1712704018151.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 23:11:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65931#M32960</guid>
      <dc:creator>ThomazRossito</dc:creator>
      <dc:date>2024-04-09T23:11:06Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65961#M32971</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103324"&gt;@ThomazRossito&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a great idea.&amp;nbsp;It can solve my problem.Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Apr 2024 00:25:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65961#M32971</guid>
      <dc:creator>akisugi</dc:creator>
      <dc:date>2024-04-10T00:25:45Z</dc:date>
    </item>
    <item>
      <title>Re: Is it possible to control the ordering of the array values created by array_agg()?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65962#M32972</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103324"&gt;@ThomazRossito&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a great idea.&amp;nbsp;It can solve my problem.Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Apr 2024 00:26:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-possible-to-control-the-ordering-of-the-array-values/m-p/65962#M32972</guid>
      <dc:creator>akisugi</dc:creator>
      <dc:date>2024-04-10T00:26:29Z</dc:date>
    </item>
  </channel>
</rss>

