cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks SQL string_agg

Harish2122
Contributor

Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.

string_agg(field_name, ', ')

โ€‹

Anyone know how to convert that to Databricks SQL?

โ€‹

Thanks in advance.

5 REPLIES 5

Heman2
Valued Contributor II

The rough equivalent would be using collect_set and array_join but note you have lost the order:

Use this โ€‹

SELECT col1, array_join(collect_set(col2), ',') j

FROM tmp

GROUP BY col1

โ€‹

Ajay-Pandey
Esteemed Contributor III

Hi @Harish Kโ€‹ you can use the below query in spark SQL-

%sql
SELECT col1, array_join(collect_set(col2), ',') j
FROM tmp
GROUP BY col1

Ajay Kumar Pandey

eriodega
New Contributor III

Here's a way that preserves ordering. It seems way to complicated to me, I'm hoping there is a more elegant way someone else can provide in a subsequent comment.

%sql
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    array_join(collect_set(col1), ',') as combined_string
FROM Blah;
--ghi,def,abc

---------------------

--but what if you want to preserve ordering?
;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT 
    ARRAY_JOIN
    (
        TRANSFORM
        (
            ARRAY_SORT
            (
                ARRAY_AGG( (col1, i) ),
                (left, right) -> CASE WHEN left.i < right.i THEN -1 WHEN left.i > right.i THEN 1 ELSE 0 END
            ),
            x -> x.col1
        ),
        ','
    ) as combined_string
FROM Blah;
--abc,def,ghi

eriodega
New Contributor III

Note: it would be great if support was added for a STRING_AGG function. Here's how simple it is to write the same order-preserving query in Postgres SQL (as an example):

;WITH Blah(col1, i) as
(
          SELECT 'abc',1
    UNION SELECT 'def',2
    UNION SELECT 'ghi',3
)
SELECT STRING_AGG(col1, ',' ORDER BY i ASC)
FROM Blah

eriodega
New Contributor III

On a support case there is now a Databricks Aha Idea request created for an order-preserving string_agg function (reference number DB-I-11734).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group