<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading empty json file in serverless gives error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/138059#M50853</link>
    <description>&lt;P&gt;hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't have any support subscription.&lt;/P&gt;</description>
    <pubDate>Fri, 07 Nov 2025 04:45:32 GMT</pubDate>
    <dc:creator>Dhruv-22</dc:creator>
    <dc:date>2025-11-07T04:45:32Z</dc:date>
    <item>
      <title>Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136795#M50649</link>
      <description>&lt;P&gt;I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.&lt;/P&gt;&lt;PRE&gt;df = spark.read.json(path)&lt;BR /&gt;df.display()&lt;BR /&gt;&lt;BR /&gt;-- Output&lt;BR /&gt;&lt;SPAN&gt;Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only &lt;BR /&gt;include the internal corrupt record column (named _corrupt_record by default). For example: &lt;BR /&gt;spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() and &lt;BR /&gt;spark.read.schema(schema).csv(file).select("_corrupt_record").show(). Instead, you can cache or &lt;BR /&gt;save the parsed results and then send the same query. &lt;BR /&gt;For example, val df = spark.read.schema(schema).csv(file).cache() and then &lt;BR /&gt;df.filter($"_corrupt_record".isNotNull).count().&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;I tried caching, but it isn't allowed in serverless compute&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#not_supported_with_serverless" target="_blank" rel="noopener noreferrer"&gt;NOT_SUPPORTED_WITH_SERVERLESS&lt;/A&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;SPAN&gt; PERSIST TABLE is not supported on serverless compute. SQLSTATE: 0A000&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;I have the following questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Why does this issue occur only in serverless compute? I tried using All-purpose compute with 15.4LTS and it created an empty dataframe.&lt;/LI&gt;&lt;LI&gt;Is there a way to display the dataframe to see what exactly is the corrupt record? I tried &lt;EM&gt;collect, select('*', lit('c'))&lt;/EM&gt; but it didn't work.&lt;/LI&gt;&lt;LI&gt;Is there a way in the serverless compute to tolerate empty files?&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Thu, 30 Oct 2025 17:40:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136795#M50649</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2025-10-30T17:40:13Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136930#M50667</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99515"&gt;@Dhruv-22&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Can you share the schema of the df? Do you have a &lt;STRONG&gt;_corrupt_record&lt;/STRONG&gt; column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?&lt;/LI&gt;
&lt;LI&gt;As per the design ,Spark blocks queries that only reference&lt;STRONG data-start="181" data-end="217"&gt;&lt;CODE data-start="198" data-end="215"&gt;_corrupt_record&lt;/CODE&gt;&lt;/STRONG&gt;&amp;nbsp; a column from raw JSON/CSV and it throws an error if explicitly accessed. But in your case, you aren't doing that, but it still throws that error, which is why we will need to schema and the explain plan of the dataframe df.explain(true).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;By the way, i created an empty file in serverless and it created an empty df as expected&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="K_Anudeep_0-1761910255579.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21215i9D62DC8CF75BD89F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="K_Anudeep_0-1761910255579.png" alt="K_Anudeep_0-1761910255579.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Oct 2025 11:31:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136930#M50667</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-31T11:31:15Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136968#M50674</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&lt;/P&gt;&lt;P&gt;Here are the details you requested.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Yes, there is a _corrupt_record column in the dataframe. It is coming since spark is treating the file to have some corrupt records. Therefore, it is generating the _corrupt_record column&lt;/LI&gt;&lt;/UL&gt;&lt;P class="lia-indent-padding-left-60px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dhruv22_2-1761913421664.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21223iCB3A97B1EC62A0F4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Dhruv22_2-1761913421664.png" alt="Dhruv22_2-1761913421664.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The error comes when I try to run display, collect or any such command. Here is the explain&lt;/LI&gt;&lt;/UL&gt;&lt;P class="lia-indent-padding-left-60px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dhruv22_3-1761913478812.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21224i7B831D1630C1EE76/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Dhruv22_3-1761913478812.png" alt="Dhruv22_3-1761913478812.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Also, db.count() is 1&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I checked the file size, it was 3 bytes. It doesn't display any character.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dhruv22_0-1761913058957.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21221i15512EEDE410799D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Dhruv22_0-1761913058957.png" alt="Dhruv22_0-1761913058957.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But, printing the hexdump it gives the following&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dhruv22_1-1761913093525.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21222i1FB28F41463C5E06/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Dhruv22_1-1761913093525.png" alt="Dhruv22_1-1761913093525.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I guess this is causing the issue. Can you tell how to deal with it? It runs fine on the all-purpose cluster though&lt;/P&gt;</description>
      <pubDate>Fri, 31 Oct 2025 12:42:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/136968#M50674</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2025-10-31T12:42:27Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137022#M50682</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99515"&gt;@Dhruv-22&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Oh .. this totally makes sense now. In that case, it is a true corrupt record..You can just add the read option&amp;nbsp;&lt;STRONG&gt;DROPMALFORMED&lt;/STRONG&gt; and it should work&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;df1 = (spark.read
.format("json")
.option("mode", "DROPMALFORMED") # &amp;lt;- drops malformed record
.load(base))

df1.display()&lt;/LI-CODE&gt;
&lt;DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 31 Oct 2025 15:04:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137022#M50682</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-31T15:04:44Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137281#M50721</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I searched more. It is not a corrupt record. The three characters represent a Byte Order Mark (BOM) signaling that the file is UTF-8 encoded. It is a standard thing. Also, the file is generated automatically by a no-code pipeline (Azure Data Factory) so it is difficult to say that there is an issue with the service.&lt;/P&gt;&lt;P&gt;The difference occurs with Photon enablement. In serverless cluster, Photon is enabled by default. In all-purpose cluster the file reads fine. When I enable photon in the all-purpose cluster, it fails with the same error as in serverless cluster.&lt;/P&gt;&lt;P&gt;So, there is a difference in the parser used by photon. It is different than normal spark. Could you some more digging and find out what exactly is the thing causing the error?&lt;/P&gt;&lt;P&gt;Thanks for the help uptil now.&lt;/P&gt;&lt;P&gt;P.S. - I got the idea of Photon from Chatgpt. I tried and found it to be true&lt;/P&gt;</description>
      <pubDate>Sun, 02 Nov 2025 11:42:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137281#M50721</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2025-11-02T11:42:54Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137361#M50729</link>
      <description>&lt;P&gt;Adding to my point. Suppose the file consists of a valid json along with the byte order mark like below (the bytes ef, bb and bf represent the byte order mark)&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;dhruv@AS-MAC-0324 Downloads&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;% cat zone.json|hexdump -C&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;00000000&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;ef bb bf 7b 22 61 22 3a&lt;SPAN class=""&gt;&amp;nbsp; &lt;/SPAN&gt;20 22 61 22 7d 0a&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;|...{"a": "a"}.|&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;0000000e&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Then a cluster with photon enabled reads the file and gives this as the output.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dhruv22_0-1762158303933.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21295iCEDDD9E971E33DCA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Dhruv22_0-1762158303933.png" alt="Dhruv22_0-1762158303933.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;i.e. the cluster reads the file properly. So, it is a bug in photon enabled environment that it is unable to read an empty file with a byte order mark.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2025 08:27:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137361#M50729</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2025-11-03T08:27:00Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137583#M50770</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99515"&gt;@Dhruv-22&lt;/a&gt;&amp;nbsp;! Thanks for the info!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I will need to analyse this internally to pinpoint the exact root cause. I advise that you raise a support case with us to have a closer look. You can raise a support case using the link:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://help.databricks.com/" target="test_blank" rel="nofollow noopener noreferrer"&gt;https://help.databricks.com/ &amp;nbsp;&lt;/A&gt;&lt;BR /&gt;and add a comment to assign it to me so that I can look and provide a detailed analysis and fix if any&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Nov 2025 14:38:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137583#M50770</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-11-04T14:38:31Z</dc:date>
    </item>
    <item>
      <title>Re: Reading empty json file in serverless gives error</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/138059#M50853</link>
      <description>&lt;P&gt;hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't have any support subscription.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2025 04:45:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/138059#M50853</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2025-11-07T04:45:32Z</dc:date>
    </item>
  </channel>
</rss>

