- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2022 01:49 PM
Thanks for following up.
I didn'get any answers from this forum which I could use directly, but it did help me to move forward.
But I still figured out a solution to the problem after more study of the subject.
The best solution was to write a schema before importing the json file, It tooks some time to write the schema and get in correct format, Ex
StructField("data",
MapType(StringType(),StructType([
StructField("invoiceId", LongType(),True),
After using a schema, spark was able to understand that the numbering sequence on the node below data, was a struct type, that was handled correctly.
After the schema was correctly defined for the json file, it was possible to use spark to do explode operations on the struct node "data"
Ex:
from pyspark.sql.functions import explode, col
df1 = dfresult.select(explode('data'))
df2 = df1.select("value.*")
In the end I got all data into a normalized table.
thanks for all contributions and efforts to help.