Re: how to flatten non standard Json files in a da...

steelman · ‎06-07-2022

Thanks for following up.

I didn'get any answers from this forum which I could use directly, but it did help me to move forward.

But I still figured out a solution to the problem after more study of the subject.

The best solution was to write a schema before importing the json file, It tooks some time to write the schema and get in correct format, Ex

StructField("data",

MapType(StringType(),StructType([

StructField("invoiceId", LongType(),True),

After using a schema, spark was able to understand that the numbering sequence on the node below data, was a struct type, that was handled correctly.

After the schema was correctly defined for the json file, it was possible to use spark to do explode operations on the struct node "data"

Ex:

from pyspark.sql.functions import explode, col

df1 = dfresult.select(explode('data'))

df2 = df1.select("value.*")

In the end I got all data into a normalized table.

thanks for all contributions and efforts to help.