Re: cant read json file with just 1,75 MiB ?

koushiknpvs · ‎05-16-2024

This can be resolved by redefining the schema structure explicitly and using that schema to read the file.

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, ArrayType

# Define the schema according to the JSON structure
schema = StructType([
StructField("field1", StringType(), True),
StructField("field2", IntegerType(), True),
# Add fields according to the JSON structure
])

# Read the JSON file with the defined schema
df = spark.read.schema(schema).json('dbfs:/mnt/makro/bronze/json_ssb/07129_20240514.json')
df.show()