How to explode an array column and repack the distinct values into one array in DB SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2022 06:07 AM
Hi, I am new to DB SQL. I have a table where the array column (cities) contains multiple arrays and some have multiple duplicate values. I need to unpack the array values into rows so I can list the distinct values. The following query works for this step.
SELECT distinct
key,
explode(cities) as cities
FROM green.cities
group by key, cities
The next step I want to repack the distinct cities into one array grouped by key.
I can do this easily in pyspark using two dataframes, first by doing an explode on the array column of the first dataframe and then doing a collect_set on the same column in the next dataframe.
Any suggestions how I can do this just using Databricks SQL?
- Labels:
-
Array Column
-
Collect_set
-
Distinct Values
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2022 06:29 AM
You can use collect_set in SQL. If you share the example input row and desired output, it would be easier to help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2022 06:45 AM
Thanks Hubert. What i tried so far is not collecting the distinct values of the expolded column grouped by key.
An example input is:
key, cities
1, ["milan","paris","new york"]
1, ["London"]
1, ["London","paris"]
1, ["London","paris"]
1, ["London","paris"]
1, ["milan","paris"]
1, ["paris","new york"]
1, ["new york"]
1, ["new york"]
2, ["milan","paris","new york"]
2, ["paris"]
2, ["paris"]
2, ["milan","paris"]
2, ["paris","new york"]
2, ["Tokyo"]
2, ["new york"]
2, ["LA","Tokyo"]
2, ["LA","Tokyo"]
The desired output is:
key, cities
1, ["milan","paris","new york","London"]
2, ["milan","paris","new york","LA","Tokyo"]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2022 08:35 PM
try to use SQL windows functions here

