<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks SQL string_agg in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17433#M11454</link>
    <description>&lt;P&gt;The rough equivalent would be using collect_set and array_join but note you have lost the order:&lt;/P&gt;&lt;P&gt;Use this ​&lt;/P&gt;&lt;P&gt;SELECT col1, array_join(collect_set(col2), ',') j&lt;/P&gt;&lt;P&gt;FROM tmp&lt;/P&gt;&lt;P&gt;GROUP BY col1&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
    <pubDate>Sat, 10 Dec 2022 11:24:14 GMT</pubDate>
    <dc:creator>Heman2</dc:creator>
    <dc:date>2022-12-10T11:24:14Z</dc:date>
    <item>
      <title>Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17432#M11453</link>
      <description>&lt;P&gt;Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;string_agg(field_name, ', ')&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Anyone know how to convert that to Databricks SQL?&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Dec 2022 11:22:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17432#M11453</guid>
      <dc:creator>Harish2122</dc:creator>
      <dc:date>2022-12-10T11:22:20Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17433#M11454</link>
      <description>&lt;P&gt;The rough equivalent would be using collect_set and array_join but note you have lost the order:&lt;/P&gt;&lt;P&gt;Use this ​&lt;/P&gt;&lt;P&gt;SELECT col1, array_join(collect_set(col2), ',') j&lt;/P&gt;&lt;P&gt;FROM tmp&lt;/P&gt;&lt;P&gt;GROUP BY col1&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Sat, 10 Dec 2022 11:24:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17433#M11454</guid>
      <dc:creator>Heman2</dc:creator>
      <dc:date>2022-12-10T11:24:14Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17434#M11455</link>
      <description>&lt;P&gt;Hi @Harish K​&amp;nbsp;you can use the below query in spark SQL-&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql
SELECT col1, array_join(collect_set(col2), ',') j
FROM tmp
GROUP BY col1&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Dec 2022 17:40:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/17434#M11455</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2022-12-10T17:40:39Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/82089#M36511</link>
      <description>&lt;P&gt;Here's a way that preserves ordering. It seems way to complicated to me, I'm hoping there is a more elegant way someone else can provide in a subsequent comment.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;%sql
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    array_join(collect_set(col1), ',') as combined_string
FROM Blah;
--ghi,def,abc

---------------------

--but what if you want to preserve ordering?
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    ARRAY_JOIN
    (
        TRANSFORM
        (
            ARRAY_SORT
            (
                ARRAY_AGG( (col1, i) ),
                (left, right) -&amp;gt; CASE WHEN left.i &amp;lt; right.i THEN -1 WHEN left.i &amp;gt; right.i THEN 1 ELSE 0 END
            ),
            x -&amp;gt; x.col1
        ),
        ','
    ) as combined_string
FROM Blah;
--abc,def,ghi&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 06 Aug 2024 20:33:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/82089#M36511</guid>
      <dc:creator>eriodega</dc:creator>
      <dc:date>2024-08-06T20:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/83969#M37086</link>
      <description>&lt;P&gt;Note: it would be great if support was added for a STRING_AGG function. Here's how simple it is to write the same order-preserving query in Postgres SQL (as an example):&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT STRING_AGG(col1, ',' ORDER BY i ASC)
FROM Blah&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 22 Aug 2024 19:40:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/83969#M37086</guid>
      <dc:creator>eriodega</dc:creator>
      <dc:date>2024-08-22T19:40:15Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/86653#M37325</link>
      <description>&lt;P&gt;On a support case there is now a Databricks Aha Idea request created for an order-preserving string_agg function (reference number DB-I-11734).&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2024 19:48:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/86653#M37325</guid>
      <dc:creator>eriodega</dc:creator>
      <dc:date>2024-08-29T19:48:30Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90794#M37997</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;If not grouping by something else:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT array_join(collect_set(field_name&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;) field_list&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; FROM table&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 17 Sep 2024 21:16:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90794#M37997</guid>
      <dc:creator>smueller</dc:creator>
      <dc:date>2024-09-17T21:16:41Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90798#M37998</link>
      <description>&lt;P&gt;Hmmm, when I try it, I get multiple rows back (I desire to only get one row back):&lt;/P&gt;&lt;LI-CODE lang="c"&gt;;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)

SELECT array_join(collect_set(col1) over (order by i), ',')
FROM Blah&lt;/LI-CODE&gt;&lt;P&gt;Results:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;abc
def,abc
def,ghi,abc&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 17 Sep 2024 21:18:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90798#M37998</guid>
      <dc:creator>eriodega</dc:creator>
      <dc:date>2024-09-17T21:18:37Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90828#M38002</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115065"&gt;@eriodega&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;If you want just to get a single row then do not use OVER:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
WITH Blah(col1, i) AS (
    SELECT 'abc', 1
    UNION
    SELECT 'def', 2
    UNION
    SELECT 'ghi', 3
)

SELECT array_join(collect_set(col1), ',') AS concatenated_string
FROM Blah&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Results:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_0-1726642975502.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11305iCED44DDA3C02F229/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_0-1726642975502.png" alt="filipniziol_0-1726642975502.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Sep 2024 07:03:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90828#M38002</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-09-18T07:03:05Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90903#M38025</link>
      <description>&lt;P&gt;yes, that returns one row. The ordering of the resultant string is non-deterministic though (I just ran it and got "def,abc,ghi"), and that is likely fine for most people's use-cases (in fact Heman2 mentioned in in the first answer in this thread). However, if one is looking for ordering, it won't be suitable and they may need to resort to the array_join,transform,array_sort,array_agg,lambda answer I posted above. I was just complaining about the cumbersome nature of it and don't mean to belabor this thread.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Sep 2024 13:13:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/90903#M38025</guid>
      <dc:creator>eriodega</dc:creator>
      <dc:date>2024-09-18T13:13:32Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL string_agg</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/106130#M42396</link>
      <description>&lt;P&gt;Adding the proper window specification for the "OVER" clause plus DISTINCT helps to achieve some resemblance of STRING_AGG for the simple&amp;nbsp; ascending order:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;select distinct
    object
    ,array_join(array_sort(collect_set(property) over (partition by object order by property ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)), ',') as properties
from 
    (
    values
    ('object 1','C')
    ,('object 2','F')
    ,('object 1','B')
    ,('object 2','E')
    ,('object 1','A')
    ,('object 2','D')
    ) as t(object,property)
order by object&lt;/LI-CODE&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="50%"&gt;object&lt;/TD&gt;&lt;TD width="50%"&gt;properties&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="50%"&gt;object 1&lt;/TD&gt;&lt;TD width="50%"&gt;A,B,C&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="50%"&gt;object 2&lt;/TD&gt;&lt;TD width="50%"&gt;D,E,F&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2025 19:05:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-string-agg/m-p/106130#M42396</guid>
      <dc:creator>just-Vlad</dc:creator>
      <dc:date>2025-01-17T19:05:12Z</dc:date>
    </item>
  </channel>
</rss>

