<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: &amp;quot;Databricks&amp;quot; - &amp;quot;PySpark&amp;quot; - Read &amp;quot;JSON&amp;quot; file - Azure Blob container - &amp;quot;APPEND BLOB&amp;quot; in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/quot-databricks-quot-quot-pyspark-quot-read-quot-json-quot-file/m-p/20173#M13606</link>
    <description>&lt;P&gt;There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga=2.258782666.1514035379.1665677010-653321784.1587659507" target="test_blank"&gt;https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga=2.258782666.1514035379.1665677010-653321784.1587659507&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 13 Oct 2022 16:25:54 GMT</pubDate>
    <dc:creator>User16856839485</dc:creator>
    <dc:date>2022-10-13T16:25:54Z</dc:date>
    <item>
      <title>"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"</title>
      <link>https://community.databricks.com/t5/data-engineering/quot-databricks-quot-quot-pyspark-quot-read-quot-json-quot-file/m-p/20170#M13603</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are getting an error "&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned script.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = spark.read.json(source_location,multiLine=True,pathGlobFilter='2022-05-18T02_50_01_914Z_student.json')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.createOrReplaceTempView('v_df')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.sql("select count(*) from v_df").display()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;can anyone please do let me know if we have any option to read JSON files which has the blob type "Append Blob"? - We are using "Databricks" - "PySpark"&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2022 12:40:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/quot-databricks-quot-quot-pyspark-quot-read-quot-json-quot-file/m-p/20170#M13603</guid>
      <dc:creator>hare</dc:creator>
      <dc:date>2022-05-19T12:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: "Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"</title>
      <link>https://community.databricks.com/t5/data-engineering/quot-databricks-quot-quot-pyspark-quot-read-quot-json-quot-file/m-p/20173#M13606</link>
      <description>&lt;P&gt;There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga=2.258782666.1514035379.1665677010-653321784.1587659507" target="test_blank"&gt;https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga=2.258782666.1514035379.1665677010-653321784.1587659507&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 16:25:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/quot-databricks-quot-quot-pyspark-quot-read-quot-json-quot-file/m-p/20173#M13606</guid>
      <dc:creator>User16856839485</dc:creator>
      <dc:date>2022-10-13T16:25:54Z</dc:date>
    </item>
  </channel>
</rss>

