cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to convert a string column to Array of Struct ?

Gopal_Sir
New Contributor III

I have a nested struct , where on of the field is a string , it looks something like this ....

string =

"[{\"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\"},{\"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"},{\"to_loc\":\"6088\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"}]"

my goal is to get it into a Array of Struct , so that each struct in this string can be exploded into a new row . Like this ,

C1,C2, C3

_ , _ , {"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\}

_ , _ , {"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\}

_ , _ , {"to_loc\":\"6088\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\}

so that finally each of those keys can also be taken out as a new column

I've tried by casting the string column into array of struct , but spark is refusing to convert my string column . Any help on this

the final schema =

ArrayType(StructType(

[StructField("to_loc",StringType(),True),

StructField("to_loc_type",StringType(),True),

StructField("qty_allocated",StringType(),True)]

))

1 ACCEPTED SOLUTION

Accepted Solutions

Gopal_Sir
New Contributor III

Ok , so I got it working .

Call the from_json() function with string column as input and the schema at second parameter . It will convert it into struct .

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

Ok this is not a complete answer, but my first guess would be to use the explode() or posexplode() function to create separate records of the array members.

Gopal_Sir
New Contributor III

Ok , so I got it working .

Call the from_json() function with string column as input and the schema at second parameter . It will convert it into struct .

-werners-
Esteemed Contributor III

Can you mark the question as answered so others can find the solution?

Gopal_Sir
New Contributor III

I've marked my comment as best . Does anything else need to be done ?

-werners-
Esteemed Contributor III

Nope 🙂