<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader:  Unexpected UnknownFieldException after streaming query termination in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128206#M48180</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Yes, I’m aware of that behavior and actually expect it. That’s why I’m handling it explicitly in the exception clause:&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&amp;nbsp;&lt;SPAN&gt;&lt;SPAN class=""&gt;if "UnknownFieldException" in error_msg and "automatic retry: true" in error_msg: print("Schema evolution detected. Retrying with updated schema...")&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;This works perfectly - it allows the loop to continue and rerun the autoloader with the updated schema. (It actually enters many times in the exception because of this, and continues with the updated schema, so it works fine.)&lt;/P&gt;&lt;P&gt;What’s strange is that the upper error message ("Some streams terminated before this command could finish!") appears &lt;STRONG&gt;after the autoloader has finished running. The logs even show ‘Streaming job ended.’, &lt;STRONG&gt;but the notebook cell still displays an ERROR. This cell contains only the autoloader code.&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 12 Aug 2025 12:14:08 GMT</pubDate>
    <dc:creator>yit</dc:creator>
    <dc:date>2025-08-12T12:14:08Z</dc:date>
    <item>
      <title>Autoloader:  Unexpected UnknownFieldException after streaming query termination</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128202#M48177</link>
      <description>&lt;P&gt;I am using Autoloader to ingest source data into Bronze layer Delta tables. The source files are JSON, and I rely on schema inference along with schema evolution (using mode: addNewColumns). To handle errors triggered by schema updates in the stream, I wrap the streaming query inside a while loop (code shown below). I also use query.awaitTermination() to enable sequential execution of subsequent commands and catch exceptions raised during streaming.&lt;/P&gt;&lt;P&gt;However, when Autoloader finishes, it raises the following error:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;Some streams terminated before &lt;SPAN class=""&gt;this&lt;/SPAN&gt; command could finish!&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;with details:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;org.apache.spark.&lt;SPAN class=""&gt;sql&lt;/SPAN&gt;.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered &lt;SPAN class=""&gt;unknown&lt;/SPAN&gt; fields during parsing, which can be fixed &lt;SPAN class=""&gt;by&lt;/SPAN&gt; an automatic retry: &lt;SPAN class=""&gt;true.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;What confuses me is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;This error appears &lt;STRONG&gt;after&lt;/STRONG&gt; the streaming query has terminated (the log line following query.awaitTermination() is already printed).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;It is &lt;STRONG&gt;not caught&lt;/STRONG&gt; in my exception handling block.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I have already implemented logic to handle this scenario, so I’m unsure why this error is still being raised.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Could you help clarify why this error might still occur despite my handling, and why it happens after the query termination?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;from pyspark.sql.functions import col

while True:
    try:
        df = (
            spark.readStream.format("cloudFiles")      
            .options(**reader_options)
            .load(source_path)
        )

        streaming_query = (
            df.writeStream
            .format("delta")
            .options(**writer_options)
            .outputMode("append")
            .trigger(**set_trigger_kwargs())
            .table(my_table)
        )

        streaming_query.awaitTermination()

        # If the query terminated without raising an error, exit the loop
        break

    except Exception as e:
        error_msg = str(e)
        if "UnknownFieldException" in error_msg and "automatic retry: true" in error_msg:
            print("Schema evolution detected. Retrying with updated schema...")
        else:
            # Log and raise other exceptions
            print(f"Streaming query exception: {error_msg}")
            raise e

print("Streaming job ended.")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 11:39:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128202#M48177</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-08-12T11:39:15Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader:  Unexpected UnknownFieldException after streaming query termination</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128205#M48179</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175553"&gt;@yit&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;This is expected behaviour of Auto Loader with schema evolution enabled. Default mode is addNewColumns which causes stream fail.&amp;nbsp;&lt;BR /&gt;As documentation says:&lt;BR /&gt;&lt;BR /&gt;"Auto Loader&amp;nbsp;detects the addition of new columns as it processes your data. When&amp;nbsp;Auto Loader&amp;nbsp;detects a new column, &lt;STRONG&gt;the stream stops with an&amp;nbsp;UnknownFieldException&lt;/STRONG&gt;. Before your stream throws this error,&amp;nbsp;Auto Loader&amp;nbsp;performs schema inference on the latest micro-batch of data and updates the schema location with the latest schema by merging new columns to the end of the schema. The data types of existing columns remain unchanged.&lt;STRONG&gt; Databricks recommends configuring&amp;nbsp;Auto Loader&amp;nbsp;streams with&amp;nbsp;Lakeflow Jobs&amp;nbsp;to restart automatically after such schema changes&lt;/STRONG&gt;."&lt;/P&gt;&lt;P&gt;And since you didn't specify mode, by deafult addNewColumns will be used:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1755000042822.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19012iCB79DC6D3BBE92CD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1755000042822.png" alt="szymon_dybczak_0-1755000042822.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 12:00:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128205#M48179</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-08-12T12:00:46Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader:  Unexpected UnknownFieldException after streaming query termination</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128206#M48180</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Yes, I’m aware of that behavior and actually expect it. That’s why I’m handling it explicitly in the exception clause:&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&amp;nbsp;&lt;SPAN&gt;&lt;SPAN class=""&gt;if "UnknownFieldException" in error_msg and "automatic retry: true" in error_msg: print("Schema evolution detected. Retrying with updated schema...")&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;This works perfectly - it allows the loop to continue and rerun the autoloader with the updated schema. (It actually enters many times in the exception because of this, and continues with the updated schema, so it works fine.)&lt;/P&gt;&lt;P&gt;What’s strange is that the upper error message ("Some streams terminated before this command could finish!") appears &lt;STRONG&gt;after the autoloader has finished running. The logs even show ‘Streaming job ended.’, &lt;STRONG&gt;but the notebook cell still displays an ERROR. This cell contains only the autoloader code.&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 12 Aug 2025 12:14:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-unexpected-unknownfieldexception-after-streaming/m-p/128206#M48180</guid>
      <dc:creator>yit</dc:creator>
      <dc:date>2025-08-12T12:14:08Z</dc:date>
    </item>
  </channel>
</rss>

