cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the databricks SQL equivalent string_agg in Postgres SQL?

128244
New Contributor II

Hello,

I am starting to use databricks and have some handy functions with Postgres SQL that I am struggling to find an equivalent in databricks. 

The function is string_agg. It is used to concatenate a list of strings with a given delimiter. More info can be found in the link. For my specific use case, I have: a list of values in rows: a, b , c. And I want to collapse them to 1 row and have the output be a->b->c. In postgres it is string_agg(rows, '->'), and then grouping by if needed.

Many thanks,

Julie

3 REPLIES 3

AmanSehgal
Honored Contributor III

You can use concat_ws for concatenating row values into one column.

Eg:

from pyspark.sql.functions import concat_ws,col
from pyspark.sql.types import StructType,StructField, StringType
 
data = [("A1","B1","C1","D1"),
         ("A2","B2","C2","D2"),
         ("A3","B3","C3","D3"),
         ("A4","B4","C3","D4")
       ]
 
schema = StructType([ \
    StructField("A",StringType(),True), \
    StructField("B",StringType(),True), \
    StructField("C",StringType(),True),\
   StructField("D",StringType(),True)
  ])
 
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show()
 
df.select(concat_ws('_',df.A,df.B,df.C).alias("ABC"),"D").show()

Output:

root
 |-- A: string (nullable = true)
 |-- B: string (nullable = true)
 |-- C: string (nullable = true)
 |-- 😧 string (nullable = true)
 
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
| A1| B1| C1| D1|
| A2| B2| C2| D2|
| A3| B3| C3| D3|
| A4| B4| C3| D4|
+---+---+---+---+
 
+--------+---+
|     ABC|  D|
+--------+---+
|A1_B1_C1| D1|
|A2_B2_C2| D2|
|A3_B3_C3| D3|
|A4_B4_C3| D4|
+--------+---+

128244
New Contributor II

thanks so much for the quick reply. I am not familiar with py.spark, do you know if this is all possible within SQL?

Vidula
Honored Contributor

Hey there @Julie Calhoun​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group