cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the databricks SQL equivalent string_agg in Postgres SQL?

128244
New Contributor II

Hello,

I am starting to use databricks and have some handy functions with Postgres SQL that I am struggling to find an equivalent in databricks. 

The function is string_agg. It is used to concatenate a list of strings with a given delimiter. More info can be found in the link. For my specific use case, I have: a list of values in rows: a, b , c. And I want to collapse them to 1 row and have the output be a->b->c. In postgres it is string_agg(rows, '->'), and then grouping by if needed.

Many thanks,

Julie

3 REPLIES 3

AmanSehgal
Honored Contributor III

You can use concat_ws for concatenating row values into one column.

Eg:

from pyspark.sql.functions import concat_ws,col
from pyspark.sql.types import StructType,StructField, StringType
 
data = [("A1","B1","C1","D1"),
         ("A2","B2","C2","D2"),
         ("A3","B3","C3","D3"),
         ("A4","B4","C3","D4")
       ]
 
schema = StructType([ \
    StructField("A",StringType(),True), \
    StructField("B",StringType(),True), \
    StructField("C",StringType(),True),\
   StructField("D",StringType(),True)
  ])
 
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show()
 
df.select(concat_ws('_',df.A,df.B,df.C).alias("ABC"),"D").show()

Output:

root
 |-- A: string (nullable = true)
 |-- B: string (nullable = true)
 |-- C: string (nullable = true)
 |-- 😧 string (nullable = true)
 
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
| A1| B1| C1| D1|
| A2| B2| C2| D2|
| A3| B3| C3| D3|
| A4| B4| C3| D4|
+---+---+---+---+
 
+--------+---+
|     ABC|  D|
+--------+---+
|A1_B1_C1| D1|
|A2_B2_C2| D2|
|A3_B3_C3| D3|
|A4_B4_C3| D4|
+--------+---+

128244
New Contributor II

thanks so much for the quick reply. I am not familiar with py.spark, do you know if this is all possible within SQL?

Vidula
Honored Contributor

Hey there @Julie Calhoun​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.