<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Workflow Automatically Marked as Failed When Autoloader Stream Fails in a Task in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127849#M48106</link>
    <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/156441"&gt;@SP_6721&lt;/a&gt;&amp;nbsp;but I'm not trying to stop the job from failing, I'm trying to not have the Databricks workflow/task be marked as Failed despite the failing streaming query.&lt;/P&gt;&lt;P&gt;Is there any way to override the failure on the engine level? Or some option I can configure such that a failing streaming query doesn't get reported to the engine?&lt;/P&gt;</description>
    <pubDate>Fri, 08 Aug 2025 18:27:46 GMT</pubDate>
    <dc:creator>r_g_s_cn</dc:creator>
    <dc:date>2025-08-08T18:27:46Z</dc:date>
    <item>
      <title>Databricks Workflow Automatically Marked as Failed When Autoloader Stream Fails in a Task</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127711#M48059</link>
      <description>&lt;P&gt;Issue: I want my Databricks Task/Workflow, which is running a pytest test, to not be automatically marked as "Failed" when an Autoloader stream shuts down due to an issue. It seems that if an Autoloader / Structured Streaming stream fails, it will automatically mark the whole Databricks Task as Failed, even if the failure of the stream is handled via catching the exception&lt;/P&gt;&lt;P&gt;Context:&lt;/P&gt;&lt;P&gt;- I have a pytest test where the code looks like the below&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;# Loading data with expected schema and running autoloader&lt;/P&gt;&lt;P&gt;load_data_with_expected_schema()&lt;/P&gt;&lt;P&gt;query_one = run_autoloader_pipeline()&lt;/P&gt;&lt;P&gt;query_one.awaitTermination()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;# Loading data with additional column and running autoloader again&lt;/P&gt;&lt;P&gt;load_data_with_unexpected_schema()&lt;/P&gt;&lt;P&gt;&lt;I&gt;try:&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;# This run fails due to additional column in new data&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; query_two = run_autoloader_pipeline()&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; query_two.awaitTermination()&lt;/P&gt;&lt;P&gt;&lt;I&gt;except Exception &lt;I&gt;as e:&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;# This run succeeds because it has picked up the new schema chanes&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;query_three =&amp;nbsp;run_autoloader_pipeline()&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;query_three.awaitTermination()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;assert ...&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;- My test passes, but the actual job is marked as a failure with this error:&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;ERROR: Some streams terminated before this command could finish!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.sql.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing: {&amp;amp;quot;some_new_field&amp;amp;quot;:&amp;amp;quot;some_new_top_level_field&amp;amp;quot;}, which can be fixed by an automatic retry: true&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;- I would like the Databricks job to not be marked as failed in cases like the above where I am purposefully failing an autoloader pipeline&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I have Tried:&lt;/P&gt;&lt;P&gt;- Catching the exception for query_two&lt;/P&gt;&lt;P&gt;- Using time.sleep(...) instead of awaitTermiation() for query_2&lt;/P&gt;&lt;P&gt;- Using dbutils.notebook.exit to gracefully exit the notebook where the pytests are running&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be much appreciated!&amp;nbsp;If this is the wrong place to post this, please direct me to the correct location&lt;/P&gt;</description>
      <pubDate>Thu, 07 Aug 2025 18:17:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127711#M48059</guid>
      <dc:creator>r_g_s_cn</dc:creator>
      <dc:date>2025-08-07T18:17:49Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Workflow Automatically Marked as Failed When Autoloader Stream Fails in a Task</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127803#M48085</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/178391"&gt;@r_g_s_cn&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;When a streaming query (like Auto Loader) fails in Databricks, especially due to a schema mismatch, the job or task is automatically marked as FAILED, even if you catch the exception in your code. That’s because the failure is detected at the engine level, outside the Python try/except block.&lt;BR /&gt;To avoid this, set:&lt;BR /&gt;&lt;EM&gt;cloudFiles.schemaEvolutionMode = "rescue"&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;This makes Auto Loader handle unexpected columns by placing them in the rescued_data column, so your job continues running without being marked as failed.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 12:20:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127803#M48085</guid>
      <dc:creator>SP_6721</dc:creator>
      <dc:date>2025-08-08T12:20:35Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Workflow Automatically Marked as Failed When Autoloader Stream Fails in a Task</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127849#M48106</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/156441"&gt;@SP_6721&lt;/a&gt;&amp;nbsp;but I'm not trying to stop the job from failing, I'm trying to not have the Databricks workflow/task be marked as Failed despite the failing streaming query.&lt;/P&gt;&lt;P&gt;Is there any way to override the failure on the engine level? Or some option I can configure such that a failing streaming query doesn't get reported to the engine?&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 18:27:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-automatically-marked-as-failed-when/m-p/127849#M48106</guid>
      <dc:creator>r_g_s_cn</dc:creator>
      <dc:date>2025-08-08T18:27:46Z</dc:date>
    </item>
  </channel>
</rss>

