cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to ignore Writestream UnknownFieldException error

mvmiller
New Contributor III

I have a parquet file that I am trying to write to a delta table:

df.writeStream

  .format("delta")

  .option("checkpointLocation", f"{targetPath}/delta/{tableName}/__checkpoints")

  .trigger(once=True)

  .foreachBatch(processTable)

  .outputMode("append")

  .start()

 

The parquet file is a product of an automatic data pull from a table in SQL Server.  Occasionally, a new column is added to the table.  When this happens, we see the following error:

org.apache.spark.sql.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_FILE] Encountered unknown fields during parsing: <newColumn1>,<newColumn2>, which can be fixed by an automatic retry: true

According to the Databricks documentation, AutoLoader by default will error out when a new column is detected. It says that Databricks recommends incorporating retries, at the workflow level.

For our purposes, we do not want to implement retries in our workflow.  We simply want the delta table to add the new column(s), and ingest the new data, without any errors.  

Can anyone please advise if there is a method to do this?

1 REPLY 1

shan_chandra
Databricks Employee
Databricks Employee

@mvmiller - Per the below documentation, The stream will fail with unknownFieldException, the schema evolution mode by default is addNewColumns. so, Databricks recommends configuring Auto Loader streams with workflows to restart automatically after such schema changes.  Incase of interactive cluster workload, can you please restart the cluster to see if the new columns are picked up. 

https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group