โ12-10-2022 03:22 AM
Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.
string_agg(field_name, ', ')
โ
Anyone know how to convert that to Databricks SQL?
โ
Thanks in advance.
โ12-10-2022 03:24 AM
The rough equivalent would be using collect_set and array_join but note you have lost the order:
Use this โ
SELECT col1, array_join(collect_set(col2), ',') j
FROM tmp
GROUP BY col1
โ
โ12-10-2022 09:40 AM
Hi @Harish Kโ you can use the below query in spark SQL-
%sql
SELECT col1, array_join(collect_set(col2), ',') j
FROM tmp
GROUP BY col1
โ08-06-2024 01:33 PM
Here's a way that preserves ordering. It seems way to complicated to me, I'm hoping there is a more elegant way someone else can provide in a subsequent comment.
%sql
;WITH Blah(col1, i) as
(
SELECT 'abc',1
UNION SELECT 'def',2
UNION SELECT 'ghi',3
)
SELECT
array_join(collect_set(col1), ',') as combined_string
FROM Blah;
--ghi,def,abc
---------------------
--but what if you want to preserve ordering?
;WITH Blah(col1, i) as
(
SELECT 'abc',1
UNION SELECT 'def',2
UNION SELECT 'ghi',3
)
SELECT
ARRAY_JOIN
(
TRANSFORM
(
ARRAY_SORT
(
ARRAY_AGG( (col1, i) ),
(left, right) -> CASE WHEN left.i < right.i THEN -1 WHEN left.i > right.i THEN 1 ELSE 0 END
),
x -> x.col1
),
','
) as combined_string
FROM Blah;
--abc,def,ghi
โ08-22-2024 12:40 PM
Note: it would be great if support was added for a STRING_AGG function. Here's how simple it is to write the same order-preserving query in Postgres SQL (as an example):
;WITH Blah(col1, i) as
(
SELECT 'abc',1
UNION SELECT 'def',2
UNION SELECT 'ghi',3
)
SELECT STRING_AGG(col1, ',' ORDER BY i ASC)
FROM Blah
โ08-29-2024 12:48 PM
On a support case there is now a Databricks Aha Idea request created for an order-preserving string_agg function (reference number DB-I-11734).
โ09-17-2024 02:09 PM - edited โ09-17-2024 02:16 PM
โ09-17-2024 02:18 PM
Hmmm, when I try it, I get multiple rows back (I desire to only get one row back):
;WITH Blah(col1, i) as
(
SELECT 'abc',1
UNION SELECT 'def',2
UNION SELECT 'ghi',3
)
SELECT array_join(collect_set(col1) over (order by i), ',')
FROM Blah
Results:
abc
def,abc
def,ghi,abc
โ09-18-2024 12:03 AM
Hi @eriodega ,
If you want just to get a single row then do not use OVER:
%sql
WITH Blah(col1, i) AS (
SELECT 'abc', 1
UNION
SELECT 'def', 2
UNION
SELECT 'ghi', 3
)
SELECT array_join(collect_set(col1), ',') AS concatenated_string
FROM Blah
Results:
โ09-18-2024 06:13 AM
yes, that returns one row. The ordering of the resultant string is non-deterministic though (I just ran it and got "def,abc,ghi"), and that is likely fine for most people's use-cases (in fact Heman2 mentioned in in the first answer in this thread). However, if one is looking for ordering, it won't be suitable and they may need to resort to the array_join,transform,array_sort,array_agg,lambda answer I posted above. I was just complaining about the cumbersome nature of it and don't mean to belabor this thread.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group