cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Read JSON with backslash.

Orianh
Valued Contributor II

Hello guys.

I'm trying to read JSON file which contains backslash and failed to read it via pyspark.

Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as text too.

example to json:

{

"fname": "max",

"lname" :" tom",

"path ": " c\\dir1\\dir2"

}

code that i tried:

df = spark.read.option('mode','PERMISSIVE').option('columnNameOfCorruptRecord', '_corrupt_record').json('path_to_json', multiLine=True)
 
df =  spark.read.text('path_to_json')

At the first code example when i don't specify the schema i get error unable to infer schema, and if i specify it i get Query returned no result.

At the second code example i get Query returned no result.

the path contains the JSON data , but because the path field pyspark fail to read it as valid json.

(If there is a way to drop the path field while reading the JSON i dont mind to do it, but didn't find any information on how to achieve that.)

Hope some one can help me out.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Orianh
Valued Contributor II

I did with with boto3 instead with pyspark since its not a lot of files.

    jsons_data = []
    client = boto3.client('s3')
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(JARVIS_BUCKET)
    for obj in bucket.objects.filter(Prefix=prefix):
      file_name = obj.key
      if re.search(ANCHOR_PATTERN, file_name):
        json_obj = client.get_object(Bucket=JARVIS_BUCKET, Key=file_name)
        body = json_obj['Body']
        json_string = body.read().decode('utf-8')
        jsons_data.append(json_normalize(json.loads(json_string,strict=False)))
 
    df = pd.concat(jsons_data)

View solution in original post

7 REPLIES 7

Kaniz_Fatma
Community Manager
Community Manager

Hi @ Orianh! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Orianh,

This worked for me.

Screenshot 2021-10-18 at 7.43.

Please let me know if this helps.

%python
ds=spark.read.option("multiline","true").json("path_to_the_json_file")
ds.printSchema()
ds.show()

jose_gonzalez
Moderator
Moderator

hi @orian hindiโ€‹ ,

Please let us know if @Kaniz Fatmaโ€‹ solution worked for you and selected as best answer. If not, please provide more details and we will help you to solve your error message.

Orianh
Valued Contributor II

Thanks for the respond, I managed to solve this by my self ๐Ÿ˜€

Awesome, Thanks.

Anonymous
Not applicable

@orian hindiโ€‹ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members. ๐Ÿ˜Ž

Orianh
Valued Contributor II

I did with with boto3 instead with pyspark since its not a lot of files.

    jsons_data = []
    client = boto3.client('s3')
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(JARVIS_BUCKET)
    for obj in bucket.objects.filter(Prefix=prefix):
      file_name = obj.key
      if re.search(ANCHOR_PATTERN, file_name):
        json_obj = client.get_object(Bucket=JARVIS_BUCKET, Key=file_name)
        body = json_obj['Body']
        json_string = body.read().decode('utf-8')
        jsons_data.append(json_normalize(json.loads(json_string,strict=False)))
 
    df = pd.concat(jsons_data)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group