<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read JSON with backslash. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13199#M7913</link>
    <description>&lt;P&gt;@orian hindi​&amp;nbsp;- Would you be happy to post the solution you came up with and then mark it as best? That will help other members. &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; &lt;/P&gt;</description>
    <pubDate>Thu, 11 Nov 2021 16:48:53 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2021-11-11T16:48:53Z</dc:date>
    <item>
      <title>Read JSON with backslash.</title>
      <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13193#M7907</link>
      <description>&lt;P&gt;Hello guys.&lt;/P&gt;&lt;P&gt;I'm trying to read JSON file which contains backslash and failed to read it via pyspark.&lt;/P&gt;&lt;P&gt;Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as text too.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;example to json:&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;"fname": "max",&lt;/P&gt;&lt;P&gt;"lname" :" tom",&lt;/P&gt;&lt;P&gt;"path ": " c\\dir1\\dir2"&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;code that i tried:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = spark.read.option('mode','PERMISSIVE').option('columnNameOfCorruptRecord', '_corrupt_record').json('path_to_json', multiLine=True)
&amp;nbsp;
df =  spark.read.text('path_to_json')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt; At the first code example when i don't specify the schema i get error unable to infer schema, and if i specify it i get Query returned no result.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;At the second code example i get Query returned no result.&lt;/P&gt;&lt;P&gt;the path contains the JSON data , but because the path field pyspark fail to read it as valid json.&lt;/P&gt;&lt;P&gt;(If there is a way to drop the path field while reading the JSON i dont mind to do it, but didn't find any information on how to achieve that.)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope some one can help me out.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 17 Oct 2021 11:55:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13193#M7907</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2021-10-17T11:55:24Z</dc:date>
    </item>
    <item>
      <title>Re: Read JSON with backslash.</title>
      <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13196#M7910</link>
      <description>&lt;P&gt;hi @orian hindi​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please let us know if @Kaniz Fatma​&amp;nbsp;solution worked for you and selected as best answer. If not, please provide more details and we will help you to solve your error message.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Oct 2021 16:58:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13196#M7910</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-21T16:58:29Z</dc:date>
    </item>
    <item>
      <title>Re: Read JSON with backslash.</title>
      <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13197#M7911</link>
      <description>&lt;P&gt;Thanks for the respond, I managed to solve this by my self &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 11 Nov 2021 12:25:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13197#M7911</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2021-11-11T12:25:59Z</dc:date>
    </item>
    <item>
      <title>Re: Read JSON with backslash.</title>
      <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13199#M7913</link>
      <description>&lt;P&gt;@orian hindi​&amp;nbsp;- Would you be happy to post the solution you came up with and then mark it as best? That will help other members. &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 11 Nov 2021 16:48:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13199#M7913</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-11T16:48:53Z</dc:date>
    </item>
    <item>
      <title>Re: Read JSON with backslash.</title>
      <link>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13200#M7914</link>
      <description>&lt;P&gt;I  did with with boto3 instead with pyspark since its not a lot of files.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    jsons_data = []
    client = boto3.client('s3')
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(JARVIS_BUCKET)
    for obj in bucket.objects.filter(Prefix=prefix):
      file_name = obj.key
      if re.search(ANCHOR_PATTERN, file_name):
        json_obj = client.get_object(Bucket=JARVIS_BUCKET, Key=file_name)
        body = json_obj['Body']
        json_string = body.read().decode('utf-8')
        jsons_data.append(json_normalize(json.loads(json_string,strict=False)))
&amp;nbsp;
    df = pd.concat(jsons_data)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Nov 2021 08:58:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-json-with-backslash/m-p/13200#M7914</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2021-11-15T08:58:55Z</dc:date>
    </item>
  </channel>
</rss>

