cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

cant read json file with just 1,75 MiB ?

NTRT
New Contributor III

Hi,

I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.

I have a json file (complex-nested) with about 1,73 MiB. 

when 

df = spark.read.option("multiLine""false").json('dbfs:/mnt/makro/bronze/json_ssb/07129_20240514.json'), spark goes on forever without finishing the job. eventually i get an error "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached."
 
Reading this file on my local computer is a no braniner !
 
you kan get the file if you send a post request to:
 
table_07129 = "https://data.ssb.no/api/v0/no/table/07129/"
query_07129 ={"query":[],"response":{"format":"json-stat2"}}
resultat = requests.post(table_07129, json = query_07129)
 
I am using a multi node (max 2 workers) 64GB 16 core each standard d16ads_v5 cluster
 
thanks for your help. 
 

 

2 REPLIES 2

koushiknpvs
New Contributor III

This can be resolved by redefining the schema structure explicitly and using that schema to read the file. 

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, ArrayType

# Define the schema according to the JSON structure
schema = StructType([
StructField("field1", StringType(), True),
StructField("field2", IntegerType(), True),
# Add fields according to the JSON structure
])

# Read the JSON file with the defined schema
df = spark.read.schema(schema).json('dbfs:/mnt/makro/bronze/json_ssb/07129_20240514.json')
df.show()

NTRT
New Contributor III

thanks for your reply. In my case I ll need to read different json files in a loop. they have not the same scheme , how to proceed in that case? thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group