cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

PK225
by New Contributor III
  • 784 Views
  • 2 replies
  • 1 kudos
  • 784 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @Pavan Kumar​,Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 1 kudos
1 More Replies
rusty9876543
by New Contributor II
  • 3951 Views
  • 5 replies
  • 2 kudos

Split dataFrame into 1MB chunks and create a single json array with each row in chunk being an array element

Hi, I have a dataFrame that I've been able to convert into a struct with each row being a JSON object.I want the ability to split the data frame into 1MB chunks. Once I have the chunks, I would like to add all rows in each respective chunk into a sin...

  • 3951 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Tamoor Mirza​ :You can use the to_json method of a DataFrame to convert each chunk to a JSON string, and then append those JSON strings to a list. Here is an example code snippet that splits a DataFrame into 1MB chunks and creates a list of JSON arr...

  • 2 kudos
4 More Replies
vicusbass
by New Contributor II
  • 5563 Views
  • 3 replies
  • 0 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 5563 Views
  • 3 replies
  • 0 kudos
Latest Reply
vicusbass
New Contributor II
  • 0 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 0 kudos
2 More Replies
kk007
by New Contributor III
  • 1694 Views
  • 4 replies
  • 4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

  • 1694 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Kamal Kumar​ :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

  • 4 kudos
3 More Replies
Galdino
by New Contributor II
  • 3193 Views
  • 3 replies
  • 1 kudos

How to read a json from BytesIO with PySpark?

I want read a json from IO variable using PySpark.My code using pandas:io = BytesIO()ftp.retrbinary('RETR '+ file_name, io.write)io.seek(0)# With pandasdf = pd.read_json(io)What I tried using PySpark, but don't work: io = BytesIO() ftp.retrbinary('...

  • 3193 Views
  • 3 replies
  • 1 kudos
Latest Reply
Erik_L
Contributor II
  • 1 kudos

Just use pandas and follow with spark.createDataFrame(df)

  • 1 kudos
2 More Replies
Sameer_876675
by New Contributor III
  • 2600 Views
  • 3 replies
  • 2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Cluster Summary OOM Error
  • 2600 Views
  • 3 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
New Contributor III
  • 2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

  • 2 kudos
2 More Replies
AndriusVitkausk
by New Contributor III
  • 869 Views
  • 1 replies
  • 0 kudos

Reading multi-dimensional json files

So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:'array of objects' -> [ {} ,{} ,{} ] It's an 'array of arrays of objects' -> [ [ {}, {} ,{} ], [ {} ,{}...

  • 869 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashish1
New Contributor III
  • 0 kudos

You can use the explode function to flatten the array to rows, can you post a simple example of your data?

  • 0 kudos
dulu
by New Contributor III
  • 1650 Views
  • 3 replies
  • 6 kudos

Is there a function similar to split_part, json_extract_scalar?

I am using spark_sql version 3.2.1. Is there a function that can replacesplit_part,json_extract_scalarare not?

  • 1650 Views
  • 3 replies
  • 6 kudos
Latest Reply
Ankush
New Contributor II
  • 6 kudos

pyspark.sql.functions.get_json_object(col, path)[source]Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.​

  • 6 kudos
2 More Replies
Gilg
by Contributor II
  • 3090 Views
  • 5 replies
  • 5 kudos

Avro Deserialization from Event Hub capture and Autoloader

Hi All,I am getting data from Event Hub capture in Avro format and using Auto Loader to process it.I get into the point where I can read the Avro by casting the Body into a string.Now I wanted to deserialized the Body column so it will in table forma...

image image
  • 3090 Views
  • 5 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Gil Gonong​, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula​ and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpfu...

  • 5 kudos
4 More Replies
antonyj453
by New Contributor II
  • 1298 Views
  • 1 replies
  • 3 kudos

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

I have tried below code to extract data which in Array:df2 = df_deidentifieddocuments_tst.select(F.explode('annotationId').alias('annotationId')).select('annotationId.$oid')It was working fine.. but,its not working for JSON object type. Below is colu...

CreateaAT
  • 1298 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Did you try extracting that column data using from_json function ?

  • 3 kudos
Sujitha
by Community Manager
  • 983 Views
  • 6 replies
  • 5 kudos

KB Feedback Discussion  In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

  • 983 Views
  • 6 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 5 kudos
5 More Replies
SRK
by Contributor III
  • 1940 Views
  • 4 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 1940 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swapnil Kamle​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 7 kudos
3 More Replies
rammy
by Contributor III
  • 1637 Views
  • 3 replies
  • 11 kudos

How would i retrieve data JSON data with namespaces using spark SQL?

File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve record...

image.png image
  • 1637 Views
  • 3 replies
  • 11 kudos
Latest Reply
SS2
Valued Contributor
  • 11 kudos

I case of struct you can use (.) For extracting the value

  • 11 kudos
2 More Replies
hare
by New Contributor III
  • 7695 Views
  • 1 replies
  • 1 kudos

Failed to merge incompatible data types

We are processing the josn file from the storage location on every day and it will get archived once the records are appended into the respective tables.source_location_path: "..../mon=05/day=01/fld1" , "..../mon=05/day=01/fld2" ..... "..../mon=05/d...

  • 7695 Views
  • 1 replies
  • 1 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 1 kudos

@Hare Krishnan​ the issues highlighted can easily be handled using the .option("mergeSchema", "true") at the time of reading all the files.Sample code:spark.read.option("mergeSchema", "true").json(<file paths>, multiLine=True)The only scenario this w...

  • 1 kudos
Labels