cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

NPE on CreateJacksonParser and Databricks 14.3LTS with Spark StructuredStreaming

joss
New Contributor II

hello,

I have a spark StructuredStreaming job : the source is a kafka topic in json.

it work find with databricks 14.2, but when a change to 14.3LTS, I have a NPE in CreateJacksonParser:

Caused by: NullPointerException: 
	at org.apache.spark.sql.catalyst.json.CreateJacksonParser$.internalRow(CreateJacksonParser.scala:93)

Do you have any idea what this error is?

Regards

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @jossThe NullPointerException (NPE) youโ€™re encountering in your Spark Structured Streaming job after upgrading to Databricks 14.3LTS can be tricky to diagnose, but letโ€™s explore some potential causes and solutions:

  1. Nullable String Columns:

    • The error message suggests that the issue might be related to nullable String columns. In Spark, nullable columns can sometimes lead to unexpected behaviour.
    • Ensure that all nullable String columns in your data are properly handled. You can use the na.fill() method to replace null values with appropriate defaults or non-nullable values.
    • Specifically, check if any of the columns involved in your streaming join are nullable Strings. If so, consider converting them to non-nullable types.
  2. Schema Evolution:

    • When upgrading Spark versions, schema evolution can cause issues. Make sure that the schema of your Kafka topic data matches the expected schema in your Spark job.
    • If there have been changes in the data schema (e.g., new fields, altered data types), update your Spark job accordingly.
  3. Data Quality and Null Checks:

    • Before performing joins, ensure that youโ€™ve applied proper null checks on the join columns.
    • Your existing code snippet shows null checks for eventTime1 and col1, but verify that similar checks are applied consistently across all relevant columns.
  4. Debugging Steps:

    • To pinpoint the exact cause, consider the following steps:
      • Log intermediate data (e.g., print the schema and sample records) to identify any unexpected null values.
      • Temporarily disable parts of your job (e.g., remove the join) and see if the NPE still occurs. This can help narrow down the issue.
      • Inspect the data in your Kafka topic to ensure it adheres to the expected schema.
  5. Upgrade-Specific Issues:

    • Check if there are any known issues related to Spark 14.3LTS or Databricks 14.3LTS. Sometimes specific versions introduce subtle bugs or behavior changes.
 

joss
New Contributor II

Hi ,

thank you for your quick reply,

i found the problem : 

 val newSchema = spark.read.json(df.select("data").as[String]).schema

if "data" have 1 value to null, in 14.2  it work, but with 14.3LTS this function return a NPE

I don't know if it is a bug

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! ๐ŸŽ‰ 

We love having you around and appreciate all your questions. Take a moment to check out the responses โ€“ you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! ๐Ÿ˜Š๐Ÿš€

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.