cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AmineHY
by Contributor
  • 8730 Views
  • 5 replies
  • 6 kudos

Resolved! How to read JSON files embedded in a list of lists?

HelloI am trying to read this JSON file but didn't succeed  You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

image image image
  • 8730 Views
  • 5 replies
  • 6 kudos
Latest Reply
adriennn
Contributor II
  • 6 kudos

The correct way to do this without using open, which will work only with local/mounted files is to read the files as binaryfile and then you will get the entire json string on each row, from there you can use from_json() and explode() to extract the ...

  • 6 kudos
4 More Replies
PK225
by New Contributor III
  • 1254 Views
  • 2 replies
  • 1 kudos
  • 1254 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @Pavan Kumar​,Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 1 kudos
1 More Replies
rusty9876543
by New Contributor II
  • 6152 Views
  • 5 replies
  • 2 kudos

Split dataFrame into 1MB chunks and create a single json array with each row in chunk being an array element

Hi, I have a dataFrame that I've been able to convert into a struct with each row being a JSON object.I want the ability to split the data frame into 1MB chunks. Once I have the chunks, I would like to add all rows in each respective chunk into a sin...

  • 6152 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Tamoor Mirza​ :You can use the to_json method of a DataFrame to convert each chunk to a JSON string, and then append those JSON strings to a list. Here is an example code snippet that splits a DataFrame into 1MB chunks and creates a list of JSON arr...

  • 2 kudos
4 More Replies
vicusbass
by New Contributor II
  • 13575 Views
  • 3 replies
  • 1 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 13575 Views
  • 3 replies
  • 1 kudos
Latest Reply
vicusbass
New Contributor II
  • 1 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 1 kudos
2 More Replies
kk007
by New Contributor III
  • 2709 Views
  • 4 replies
  • 4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

  • 2709 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Kamal Kumar​ :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

  • 4 kudos
3 More Replies
Galdino
by New Contributor II
  • 4326 Views
  • 3 replies
  • 1 kudos

How to read a json from BytesIO with PySpark?

I want read a json from IO variable using PySpark.My code using pandas:io = BytesIO()ftp.retrbinary('RETR '+ file_name, io.write)io.seek(0)# With pandasdf = pd.read_json(io)What I tried using PySpark, but don't work: io = BytesIO() ftp.retrbinary('...

  • 4326 Views
  • 3 replies
  • 1 kudos
Latest Reply
Erik_L
Contributor II
  • 1 kudos

Just use pandas and follow with spark.createDataFrame(df)

  • 1 kudos
2 More Replies
Sameer_876675
by New Contributor III
  • 4100 Views
  • 3 replies
  • 2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Cluster Summary OOM Error
  • 4100 Views
  • 3 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
Contributor
  • 2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

  • 2 kudos
2 More Replies
AndriusVitkausk
by New Contributor III
  • 1318 Views
  • 1 replies
  • 0 kudos

Reading multi-dimensional json files

So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:'array of objects' -> [ {} ,{} ,{} ] It's an 'array of arrays of objects' -> [ [ {}, {} ,{} ], [ {} ,{}...

  • 1318 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashish1
New Contributor III
  • 0 kudos

You can use the explode function to flatten the array to rows, can you post a simple example of your data?

  • 0 kudos
dulu
by New Contributor III
  • 2805 Views
  • 2 replies
  • 6 kudos

Is there a function similar to split_part, json_extract_scalar?

I am using spark_sql version 3.2.1. Is there a function that can replacesplit_part,json_extract_scalarare not?

  • 2805 Views
  • 2 replies
  • 6 kudos
Latest Reply
Ankush
New Contributor II
  • 6 kudos

pyspark.sql.functions.get_json_object(col, path)[source]Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.​

  • 6 kudos
1 More Replies
Gilg
by Contributor II
  • 4623 Views
  • 4 replies
  • 5 kudos

Avro Deserialization from Event Hub capture and Autoloader

Hi All,I am getting data from Event Hub capture in Avro format and using Auto Loader to process it.I get into the point where I can read the Avro by casting the Body into a string.Now I wanted to deserialized the Body column so it will in table forma...

image image
  • 4623 Views
  • 4 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

If you still want to go with the above approach and don't want to provide schema manually, then you can fetch a tiny batch with 1 record and build the schema into a variable using a .schema option. Once done, you can add a new Body column by providin...

  • 5 kudos
3 More Replies
antonyj453
by New Contributor II
  • 2037 Views
  • 1 replies
  • 3 kudos

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

I have tried below code to extract data which in Array:df2 = df_deidentifieddocuments_tst.select(F.explode('annotationId').alias('annotationId')).select('annotationId.$oid')It was working fine.. but,its not working for JSON object type. Below is colu...

CreateaAT
  • 2037 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Did you try extracting that column data using from_json function ?

  • 3 kudos
Sujitha
by Community Manager
  • 1815 Views
  • 6 replies
  • 5 kudos

KB Feedback Discussion  In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

  • 1815 Views
  • 6 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 5 kudos
5 More Replies
SRK
by Contributor III
  • 2996 Views
  • 4 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 2996 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swapnil Kamle​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 7 kudos
3 More Replies
rammy
by Contributor III
  • 2576 Views
  • 3 replies
  • 11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

image.png image
  • 2576 Views
  • 3 replies
  • 11 kudos
Latest Reply
SS2
Valued Contributor
  • 11 kudos

I case of struct you can use (.) For extracting the value

  • 11 kudos
2 More Replies
Labels