Thanks Hubert. What i tried so far is not collecting the distinct values of the expolded column grouped by key.
An example input is:
key, cities
1, ["milan","paris","new york"]
1, ["London"]
1, ["London","paris"]
1, ["London","paris"]
1, ["London","paris"]
1, ["milan","paris"]
1, ["paris","new york"]
1, ["new york"]
1, ["new york"]
2, ["milan","paris","new york"]
2, ["paris"]
2, ["paris"]
2, ["milan","paris"]
2, ["paris","new york"]
2, ["Tokyo"]
2, ["new york"]
2, ["LA","Tokyo"]
2, ["LA","Tokyo"]
The desired output is:
key, cities
1, ["milan","paris","new york","London"]
2, ["milan","paris","new york","LA","Tokyo"]