cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks SQL string_agg

Harish2122
Contributor

Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.

string_agg(field_name, ', ')

โ€‹

Anyone know how to convert that to Databricks SQL?

โ€‹

Thanks in advance.

9 REPLIES 9

Heman2
Valued Contributor II

The rough equivalent would be using collect_set and array_join but note you have lost the order:

Use this โ€‹

SELECT col1, array_join(collect_set(col2), ',') j

FROM tmp

GROUP BY col1

โ€‹

Ajay-Pandey
Esteemed Contributor III

Hi @Harish Kโ€‹ you can use the below query in spark SQL-

%sql
SELECT col1, array_join(collect_set(col2), ',') j
FROM tmp
GROUP BY col1

Ajay Kumar Pandey

eriodega
Contributor

Here's a way that preserves ordering. It seems way to complicated to me, I'm hoping there is a more elegant way someone else can provide in a subsequent comment.

%sql
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    array_join(collect_set(col1), ',') as combined_string
FROM Blah;
--ghi,def,abc

---------------------

--but what if you want to preserve ordering?
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    ARRAY_JOIN
    (
        TRANSFORM
        (
            ARRAY_SORT
            (
                ARRAY_AGG( (col1, i) ),
                (left, right) -> CASE WHEN left.i < right.i THEN -1 WHEN left.i > right.i THEN 1 ELSE 0 END
            ),
            x -> x.col1
        ),
        ','
    ) as combined_string
FROM Blah;
--abc,def,ghi

eriodega
Contributor

Note: it would be great if support was added for a STRING_AGG function. Here's how simple it is to write the same order-preserving query in Postgres SQL (as an example):

;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT STRING_AGG(col1, ',' ORDER BY i ASC)
FROM Blah

eriodega
Contributor

On a support case there is now a Databricks Aha Idea request created for an order-preserving string_agg function (reference number DB-I-11734).

smueller
New Contributor II
If not grouping by something else:
 
SELECT array_join(collect_set(field_name), ',') field_list
    FROM table

Hmmm, when I try it, I get multiple rows back (I desire to only get one row back):

;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)

SELECT array_join(collect_set(col1) over (order by i), ',')
FROM Blah

Results:

abc
def,abc
def,ghi,abc

Hi @eriodega ,

If you want just to get a single row then do not use OVER:

%sql
WITH Blah(col1, i) AS (
    SELECT 'abc', 1
    UNION
    SELECT 'def', 2
    UNION
    SELECT 'ghi', 3
)

SELECT array_join(collect_set(col1), ',') AS concatenated_string
FROM Blah

 Results:

filipniziol_0-1726642975502.png

 

yes, that returns one row. The ordering of the resultant string is non-deterministic though (I just ran it and got "def,abc,ghi"), and that is likely fine for most people's use-cases (in fact Heman2 mentioned in in the first answer in this thread). However, if one is looking for ordering, it won't be suitable and they may need to resort to the array_join,transform,array_sort,array_agg,lambda answer I posted above. I was just complaining about the cumbersome nature of it and don't mean to belabor this thread.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group