<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to ignore Writestream UnknownFieldException error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-ignore-writestream-unknownfieldexception-error/m-p/64428#M32568</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/93125"&gt;@mvmiller&lt;/a&gt;&amp;nbsp;- Per the below documentation, The stream will fail with unknownFieldException, the schema evolution mode by default is addNewColumns. so,&amp;nbsp;&lt;SPAN&gt;Databricks recommends configuring Auto Loader streams with&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/workflows/" data-linktype="relative-path" target="_blank"&gt;workflows&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;to restart automatically after such schema changes.&amp;nbsp; Incase of interactive cluster workload, can you please restart the cluster to see if the new columns are picked up.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 22 Mar 2024 22:53:08 GMT</pubDate>
    <dc:creator>shan_chandra</dc:creator>
    <dc:date>2024-03-22T22:53:08Z</dc:date>
    <item>
      <title>How to ignore Writestream UnknownFieldException error</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-ignore-writestream-unknownfieldexception-error/m-p/64337#M32545</link>
      <description>&lt;P&gt;I have a parquet file that I am trying to write to a delta table:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;df.writeStream&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .format("delta")&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .option("checkpointLocation", f"{targetPath}/delta/{tableName}/__checkpoints")&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .trigger(once=True)&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .foreachBatch(processTable)&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .outputMode("append")&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp; .start()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The parquet file is a product of an automatic data pull from a table in SQL Server.&amp;nbsp; Occasionally, a new column is added to the table.&amp;nbsp; When this happens, we see the following error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;EM&gt;org.apache.spark.sql.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_FILE] Encountered unknown fields during parsing: &amp;lt;newColumn1&amp;gt;,&amp;lt;newColumn2&amp;gt;, which can be fixed by an automatic retry: true&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;According to the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema" target="_self"&gt;Databricks documentation,&amp;nbsp;&lt;/A&gt;AutoLoader by default will error out when a new column is detected. It says that Databricks recommends incorporating retries, at the workflow level. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;For our purposes, we do not want to implement retries in our workflow.&amp;nbsp; We simply want the delta table to add the new column(s), and ingest the new data, without any errors. &amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone please advise if there is a method to do this?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 20:41:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-ignore-writestream-unknownfieldexception-error/m-p/64337#M32545</guid>
      <dc:creator>mvmiller</dc:creator>
      <dc:date>2024-03-21T20:41:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to ignore Writestream UnknownFieldException error</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-ignore-writestream-unknownfieldexception-error/m-p/64428#M32568</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/93125"&gt;@mvmiller&lt;/a&gt;&amp;nbsp;- Per the below documentation, The stream will fail with unknownFieldException, the schema evolution mode by default is addNewColumns. so,&amp;nbsp;&lt;SPAN&gt;Databricks recommends configuring Auto Loader streams with&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/workflows/" data-linktype="relative-path" target="_blank"&gt;workflows&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;to restart automatically after such schema changes.&amp;nbsp; Incase of interactive cluster workload, can you please restart the cluster to see if the new columns are picked up.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/schema&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Mar 2024 22:53:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-ignore-writestream-unknownfieldexception-error/m-p/64428#M32568</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-03-22T22:53:08Z</dc:date>
    </item>
  </channel>
</rss>

