I tried your exact explode example and it worked. I then plugged in my data like so:

df = sqlContext.table("owner_final_delta")

import pyspark.sql.functions as F

df.select(F.explode(df.contacts.emails[0].emailId).alias("email")).show()

This worked but notice I am using [0] (index 0 of the emails array). I tried using [*] there but I get an invalid syntax error. Is there a way to loop through all df.contacts.emails and return all the .emailIds in one column?

For the SQL method the column name holding the JSON structure is contacts. So I tried the query exactly as you have written it:

select contacts:contacts.emails[*].emailId from owner_final_delta

This returns this error essentially saying there is an argument type mismatch in that contacts is not a string:

Error in SQL statement: AnalysisException: cannot resolve 'semi_structured_extract_json_multi(spark_catalog.default.owner_final_delta.contacts, '$.contacts.emails[*].emailId')' due to data type mismatch: argument 1 requires string type, however, 'spark_catalog.default.owner_final_delta.contacts' is of array<struct<address:struct<apartment:string,city:string,house:string,poBox:string,sources:array<string>,state:string,street:string,type:string,zip:string>,addresses:array<struct<apartment:string,city:string,house:string,lastSeen:string,poBox:string,sources:array<string>,state:string,street:string,type:string,zip:string>>,contactKey:string,emails:array<struct<emailId:string,lastSeen:string,sources:array<string>>>,lastModifiedDate:string,name:struct<firstNames:array<string>,lastNames:array<string>,middleNames:array<string>,salutations:array<string>,suffixes:array<string>>,phones:array<struct<extension:string,lastSeen:string,lineType:string,number:string,sources:array<string>,validSince:string>>,relationship:string,sources:array<string>>> type.; line 1 pos 7;

'Project [semi_structured_extract_json_multi(spark_catalog.default.owner_final_delta.contacts, '$.contacts.emails[*].emailId') AS emailId#956]

How should this be resolved?