<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading JSON from Databricks Workspace in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/91996#M38313</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105089"&gt;@Saf4Databricks&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;As you said, you probably need to add multiline options to make it work. You can use this option when creating temporary view or using pyspark api. Below is example of creating temporary view:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE TEMPORARY VIEW multilineJson
USING json
OPTIONS (path="file:/Workspace/Users/myusername@outlook.com/myJsonFile_in_Workspace.json",multiline=true)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 27 Sep 2024 06:42:17 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2024-09-27T06:42:17Z</dc:date>
    <item>
      <title>Reading JSON from Databricks Workspace</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/91970#M38307</link>
      <description>&lt;P&gt;I am using second example from Databricks` official document here: &lt;A title="Work with workspace files" href="https://docs.databricks.com/en/files/index.html#work-with-workspace-files" target="_blank" rel="noopener"&gt;Work with workspace files&lt;/A&gt;. But I'm getting following error:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;: What could be a cause of the error, and how can we fix it?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ERROR&lt;/STRONG&gt;: &lt;EM&gt;Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;referenced columns only include the internal corrupt record column&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;(named _corrupt_record by default)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Code&lt;/STRONG&gt;:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
SELECT * FROM json.`file:/Workspace/Users/myusername@outlook.com/myJsonFile_in_Workspace.json`;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Json file in my Databricks Workspace&lt;/STRONG&gt;:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
"header": {
"platform": "atm",
"version": "2.0"
},
"details": [
{
"abc": "3",
"def": "4"
},
{
"abc": "5",
"def": "6"
}
]
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Remarks&lt;/STRONG&gt;: Minimizing JSON to single file is one possibility. The json used in my post is for explaining the question only. The actual Json that I am using is quite large and complex - and in such cases, Apache Spark's &lt;A title="official document" href="https://spark.apache.org/docs/latest/sql-data-sources-json.html" target="_blank" rel="noopener"&gt;official document&lt;/A&gt; recommends: `For a regular multi-line JSON file, set the multiline parameter to True` - as shown in &lt;A title="this example" href="https://stackoverflow.com/a/38552139/1232087" target="_blank" rel="noopener"&gt;this example&lt;/A&gt;. But I'm not sure how to use this option when you're reading json from a `Databrick Workspace` that's what my code above is doing.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Sep 2024 23:56:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/91970#M38307</guid>
      <dc:creator>Saf4Databricks</dc:creator>
      <dc:date>2024-09-26T23:56:23Z</dc:date>
    </item>
    <item>
      <title>Re: Reading JSON from Databricks Workspace</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/91996#M38313</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105089"&gt;@Saf4Databricks&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;As you said, you probably need to add multiline options to make it work. You can use this option when creating temporary view or using pyspark api. Below is example of creating temporary view:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE TEMPORARY VIEW multilineJson
USING json
OPTIONS (path="file:/Workspace/Users/myusername@outlook.com/myJsonFile_in_Workspace.json",multiline=true)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Sep 2024 06:42:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/91996#M38313</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-27T06:42:17Z</dc:date>
    </item>
    <item>
      <title>Re: Reading JSON from Databricks Workspace</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92144#M38370</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Thank you for sharing your thoughts. I probably should have elaborated bit more on my question. Yes, we can use Apache Spark to read a Json file as shown in the first example of the link provided in my post above. For example, in the following code, I modified their first Apache Spark example by adding&amp;nbsp;&lt;STRONG&gt;multiline&lt;/STRONG&gt; option - and it worked fine:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.read.option("multiline","true").format("json").load("file:/Workspace/Users/myusername@outlook.com/myJsonFile_in_Workspace.json").show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But in their second example from that same link stated that you can &lt;STRONG&gt;directly&lt;/STRONG&gt; read (query) a file from your Databricks workspace by using &lt;STRONG&gt;Spark SQL and Databricks SQL&lt;/STRONG&gt; as follows. So, my question would be if the file is multiline Json file, then how would you use multiline option in their example given below?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT * FROM json.`file:/Workspace/Users/&amp;lt;user-folder&amp;gt;/file.json`;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Sep 2024 23:38:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92144#M38370</guid>
      <dc:creator>Saf4Databricks</dc:creator>
      <dc:date>2024-09-27T23:38:01Z</dc:date>
    </item>
    <item>
      <title>Re: Reading JSON from Databricks Workspace</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92153#M38373</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105089"&gt;@Saf4Databricks&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;You can try following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT *
FROM json.`file:/Workspace/Users/&amp;lt;user-folder&amp;gt;/file.json`
OPTIONS(multiline,true);&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Sep 2024 08:08:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92153#M38373</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-28T08:08:16Z</dc:date>
    </item>
    <item>
      <title>Re: Reading JSON from Databricks Workspace</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92157#M38375</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105089"&gt;@Saf4Databricks&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I tried the above and it didn't work. I think there is no way to pass this option using this syntax. At least there is no entry in documentation of how to do it. But the way I proposed in my previous post aligns with what databricks documentation is saying. So, direct equivalent of &lt;STRONG&gt;spark.read.option("multiline", "true").format("json")&lt;/STRONG&gt;&amp;nbsp; in SQL API is to create view and use OPTIONS clause to pass required parameters&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/query/formats/json" target="_blank"&gt;JSON file - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1727514594246.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11565iAEF4950FDB9DD8AA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1727514594246.png" alt="szymon_dybczak_0-1727514594246.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Sep 2024 09:13:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-json-from-databricks-workspace/m-p/92157#M38375</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-28T09:13:48Z</dc:date>
    </item>
  </channel>
</rss>

