Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2024 12:15 PM - edited 05-16-2024 12:34 AM
Hi,
I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.
I have a json file (complex-nested) with about 1,73 MiB.
when
df = spark.read.option("multiLine", "false").json('dbfs:/mnt/makro/bronze/json_ssb/07129_20240514.json'), spark goes on forever without finishing the job. eventually i get an error "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached."
Reading this file on my local computer is a no braniner !
you kan get the file if you send a post request to:
table_07129 = "https://data.ssb.no/api/v0/no/table/07129/"
query_07129 ={"query":[],"response":{"format":"json-stat2"}}
query_07129 ={"query":[],"response":{"format":"json-stat2"}}
resultat = requests.post(table_07129, json = query_07129)
I am using a multi node (max 2 workers) 64GB 16 core each standard d16ads_v5 cluster
thanks for your help.