Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2022 11:20 AM
I think you are right. We are close.
For Python, let's try exploding twice. If we have, [[emailId, date, source], [emailId, date, source], [emailId, date, source]] then let us explode that column out as well so each email ID has its own row.
df = sqlContext.table("owner_final_delta")
import pyspark.sql.functions as F
df.select(F.explode(F.explode(df.contacts.emails).alias("email")).alias("email_ids")).show()And then for SQL, does this command give you errors as well? I
SELECT from_json(contacts:emails[*], 'array<array<string>>') emails FROM owner_final_deltaI believe emails is an array of an array of strings. I want to see if we can get here first without any errors before digging deeper into the nest.