cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to convert a string column to Array of Struct ?

Gopal_Sir
New Contributor III

I have a nested struct , where on of the field is a string , it looks something like this ....

string =

"[{\"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\"},{\"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"},{\"to_loc\":\"6088\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"}]"

my goal is to get it into a Array of Struct , so that each struct in this string can be exploded into a new row . Like this ,

C1,C2, C3

_ , _ , {"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\}

_ , _ , {"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\}

_ , _ , {"to_loc\":\"6088\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\}

so that finally each of those keys can also be taken out as a new column

I've tried by casting the string column into array of struct , but spark is refusing to convert my string column . Any help on this

the final schema =

ArrayType(StructType(

[StructField("to_loc",StringType(),True),

StructField("to_loc_type",StringType(),True),

StructField("qty_allocated",StringType(),True)]

))

1 ACCEPTED SOLUTION

Accepted Solutions

Gopal_Sir
New Contributor III

Ok , so I got it working .

Call the from_json() function with string column as input and the schema at second parameter . It will convert it into struct .

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

Ok this is not a complete answer, but my first guess would be to use the explode() or posexplode() function to create separate records of the array members.

Gopal_Sir
New Contributor III

Ok , so I got it working .

Call the from_json() function with string column as input and the schema at second parameter . It will convert it into struct .

-werners-
Esteemed Contributor III

Can you mark the question as answered so others can find the solution?

Gopal_Sir
New Contributor III

I've marked my comment as best . Does anything else need to be done ?

-werners-
Esteemed Contributor III

Nope 🙂

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.