<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Stream failure JsonParseException in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/54139#M6254</link>
    <description>&lt;P&gt;Hi all! I am having the following issue with a couple of pyspark streams.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have some notebooks running each of them an independent file structured streaming using&amp;nbsp; delta bronze table&amp;nbsp; (gzip parquet files) dumped from kinesis to S3 in a previous job. Each file contains some events in &lt;STRONG&gt;json format&lt;/STRONG&gt; that need to be aggregated in different ways for further dump to aws S3 again (just dumped, not appended any table).&amp;nbsp; Between the events, sometimes I get an corrupted event in string format which I need to filter from the stream. Let suppose the event is a single string that says "error_event".&lt;/P&gt;&lt;P&gt;At the beginning of the notebook, the firsts things I do after spark.readStream are:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;1. bronze_df.where(f.col(&lt;/SPAN&gt;&lt;SPAN&gt;"data"&lt;/SPAN&gt;&lt;SPAN&gt;) != &lt;/SPAN&gt;&lt;SPAN&gt;"error_event"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. apply schema to data column to get expected format from json&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;For some reason I haven't been able to figure out yet, some of the streams fail when I change my cluster mode from photon to standard returning the following error, despite they all use the same function to filter the error events:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Error details:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Caused by: org.apache.spark.SparkException: [MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION] Malformed records are detected in record parsing: [null,null,null,null,null,null,null,null,null,null,null,null,null].

Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'error_event': was expecting (JSON String, Number (or 'NaN'/'INF'/'+INF'), Array, Object or token 'null', 'true' or 'false') at [Source: (InputStreamReader)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas of what might be causing it? Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 28 Nov 2023 18:53:20 GMT</pubDate>
    <dc:creator>patojo94</dc:creator>
    <dc:date>2023-11-28T18:53:20Z</dc:date>
    <item>
      <title>Stream failure JsonParseException</title>
      <link>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/54139#M6254</link>
      <description>&lt;P&gt;Hi all! I am having the following issue with a couple of pyspark streams.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have some notebooks running each of them an independent file structured streaming using&amp;nbsp; delta bronze table&amp;nbsp; (gzip parquet files) dumped from kinesis to S3 in a previous job. Each file contains some events in &lt;STRONG&gt;json format&lt;/STRONG&gt; that need to be aggregated in different ways for further dump to aws S3 again (just dumped, not appended any table).&amp;nbsp; Between the events, sometimes I get an corrupted event in string format which I need to filter from the stream. Let suppose the event is a single string that says "error_event".&lt;/P&gt;&lt;P&gt;At the beginning of the notebook, the firsts things I do after spark.readStream are:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;1. bronze_df.where(f.col(&lt;/SPAN&gt;&lt;SPAN&gt;"data"&lt;/SPAN&gt;&lt;SPAN&gt;) != &lt;/SPAN&gt;&lt;SPAN&gt;"error_event"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. apply schema to data column to get expected format from json&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;For some reason I haven't been able to figure out yet, some of the streams fail when I change my cluster mode from photon to standard returning the following error, despite they all use the same function to filter the error events:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Error details:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Caused by: org.apache.spark.SparkException: [MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION] Malformed records are detected in record parsing: [null,null,null,null,null,null,null,null,null,null,null,null,null].

Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'error_event': was expecting (JSON String, Number (or 'NaN'/'INF'/'+INF'), Array, Object or token 'null', 'true' or 'false') at [Source: (InputStreamReader)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas of what might be causing it? Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2023 18:53:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/54139#M6254</guid>
      <dc:creator>patojo94</dc:creator>
      <dc:date>2023-11-28T18:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: Stream failure JsonParseException</title>
      <link>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/55128#M6256</link>
      <description>&lt;P&gt;Thank you sir for answering, that helps a lot. Please mark it as a solution.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2023 08:30:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/55128#M6256</guid>
      <dc:creator>WilliamFlanagan</dc:creator>
      <dc:date>2023-12-12T08:30:39Z</dc:date>
    </item>
    <item>
      <title>Re: Stream failure JsonParseException</title>
      <link>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/147953#M11418</link>
      <description>&lt;P&gt;Thanks for the detail answer I've been searching for. If you play at online casinos, you should check out the &lt;A href="https://www.casinobonuscatalog.com/best-payout-online-casinos/" target="_self"&gt;best online casinos that payout&lt;/A&gt; that offer the best gaming experiences.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Feb 2026 20:29:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/stream-failure-jsonparseexception/m-p/147953#M11418</guid>
      <dc:creator>sarahmorgan</dc:creator>
      <dc:date>2026-02-10T20:29:24Z</dc:date>
    </item>
  </channel>
</rss>

