What is the databricks SQL equivalent string_agg in Postgres SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2022 03:27 AM
Hello,
I am starting to use databricks and have some handy functions with Postgres SQL that I am struggling to find an equivalent in databricks.
The function is string_agg. It is used to concatenate a list of strings with a given delimiter. More info can be found in the link. For my specific use case, I have: a list of values in rows: a, b , c. And I want to collapse them to 1 row and have the output be a->b->c. In postgres it is string_agg(rows, '->'), and then grouping by if needed.
Many thanks,
Julie
- Labels:
-
PostgresSQL
-
SQL
-
String Agg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2022 06:43 AM
You can use concat_ws for concatenating row values into one column.
Eg:
from pyspark.sql.functions import concat_ws,col
from pyspark.sql.types import StructType,StructField, StringType
data = [("A1","B1","C1","D1"),
("A2","B2","C2","D2"),
("A3","B3","C3","D3"),
("A4","B4","C3","D4")
]
schema = StructType([ \
StructField("A",StringType(),True), \
StructField("B",StringType(),True), \
StructField("C",StringType(),True),\
StructField("D",StringType(),True)
])
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show()
df.select(concat_ws('_',df.A,df.B,df.C).alias("ABC"),"D").show()
Output:
root
|-- A: string (nullable = true)
|-- B: string (nullable = true)
|-- C: string (nullable = true)
|-- 😧 string (nullable = true)
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| A1| B1| C1| D1|
| A2| B2| C2| D2|
| A3| B3| C3| D3|
| A4| B4| C3| D4|
+---+---+---+---+
+--------+---+
| ABC| D|
+--------+---+
|A1_B1_C1| D1|
|A2_B2_C2| D2|
|A3_B3_C3| D3|
|A4_B4_C3| D4|
+--------+---+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2022 07:43 AM
thanks so much for the quick reply. I am not familiar with py.spark, do you know if this is all possible within SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-05-2022 04:40 AM
Hey there @Julie Calhoun
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)