cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Stream failure JsonParseException

patojo94
New Contributor II

Hi all! I am having the following issue with a couple of pyspark streams. 

I have some notebooks running each of them an independent file structured streaming using  delta bronze table  (gzip parquet files) dumped from kinesis to S3 in a previous job. Each file contains some events in json format that need to be aggregated in different ways for further dump to aws S3 again (just dumped, not appended any table).  Between the events, sometimes I get an corrupted event in string format which I need to filter from the stream. Let suppose the event is a single string that says "error_event".

At the beginning of the notebook, the firsts things I do after spark.readStream are: 

1. bronze_df.where(f.col("data") != "error_event")
2. apply schema to data column to get expected format from json
 
For some reason I haven't been able to figure out yet, some of the streams fail when I change my cluster mode from photon to standard returning the following error, despite they all use the same function to filter the error events:
 

Error details:

 

 

Caused by: org.apache.spark.SparkException: [MALFORMED_RECORD_IN_PARSING.WITHOUT_SUGGESTION] Malformed records are detected in record parsing: [null,null,null,null,null,null,null,null,null,null,null,null,null].

Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'error_event': was expecting (JSON String, Number (or 'NaN'/'INF'/'+INF'), Array, Object or token 'null', 'true' or 'false') at [Source: (InputStreamReader)

 

 

Any ideas of what might be causing it? Thanks in advance!

 

1 REPLY 1

Thank you sir for answering, that helps a lot. Please mark it as a solution.

If you're into online gaming cashlib casino is a must-try. As a self-proclaimed 'European specialist in online payment solutions,' they live up to the reputation. The article provides a detailed exploration of its features, functionality, and advantages. I personally found it to be a game-changer in enhancing my overall gambling experience.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group