Re: Read Array of Arrays of Objects JSON file usin...

Joost1024 · ‎12-17-2025

I guess I was a bit over enthusiastic by accepting the answer.

When I run the following on the single object array of arrays (as shown in the original post) I get a single row with column "value" and value null.

from pyspark.sql import functions as F, types as T

inner = T.StructType([
   T.StructField("entity_id", T.StringType(), False),
   T.StructField("state", T.StringType(), True),
   T.StructField("attributes", T.MapType(T.StringType(), T.StringType()), True),
   T.StructField("last_changed", T.StringType(), False),
   T.StructField("last_updated", T.StringType(), False),
])

schema = T.StructType([
   T.StructField("value", T.ArrayType(T.ArrayType(inner)), True)
])

df0 = (spark.read.format("json")
   .option("multiLine", "true")
   .option("primitivesAsString", "true")
   .schema(schema)
   .load("<S3 path>/original-single-item.json"))

display(df0)