<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Workflow Notification: Pass/Failed –Schema Evolution/Rescue Mode Triggered for complex  json fil in Databricks Free Edition Help</title>
    <link>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/154053#M746</link>
    <description>&lt;P&gt;If fields are added in the JSON file from the source, the nested field names are captured in the rescue column. However, we do not get any indication of operation types such as addition, deletion, or data type changes. Attribute type changes and attribute deletions are not captured, and in these two scenarios, no values are stored in the rescue column.&lt;/P&gt;</description>
    <pubDate>Fri, 10 Apr 2026 09:24:59 GMT</pubDate>
    <dc:creator>SantiNath_Dey</dc:creator>
    <dc:date>2026-04-10T09:24:59Z</dc:date>
    <item>
      <title>Workflow Notification: Pass/Failed –Schema Evolution/Rescue Mode Triggered for complex  json file</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150657#M711</link>
      <description>&lt;P&gt;We are implementing an incremental load for semi-structured data [ complex nested Json file ]using &lt;STRONG&gt;Auto Loader&lt;/STRONG&gt;. To handle schema drifts—such as new fields, changes in column order, or data type and precision modifications (e.g., Decimal and Integer)—we are utilizing &lt;STRONG&gt;'rescue' mode&lt;/STRONG&gt;. If these changes occur in a subsequent batch, the pipeline will log the mismatched entries into a Delta table and trigger a &lt;STRONG&gt;Databricks Workflow&lt;/STRONG&gt; email notification containing a full description of the drift. Could you please suggest the approach and sudo code. Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 05:58:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150657#M711</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-03-12T05:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow Notification: Pass/Failed –Schema Evolution/Rescue Mode Triggered for complex  json fil</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150667#M712</link>
      <description>&lt;P class="p1"&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/219024"&gt;@SantiNath_Dey&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;Good question. This is a pretty common pattern, and yes — Auto Loader rescue mode is a strong fit for it. The cleanest way to think about the solution is in three parts: ingest safely, detect drift, and surface it through workflow failure notifications.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;STRONG&gt;Step 1: Use Auto Loader in rescue mode&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="p1"&gt;The key setting here is:&lt;/P&gt;
&lt;P class="p4"&gt;cloudFiles.schemaEvolutionMode = "rescue"&lt;/P&gt;
&lt;P class="p1"&gt;That tells Auto Loader not to evolve the schema and not to fail the stream when it encounters unexpected fields, type changes, or precision mismatches. Instead, anything that does not conform to the schema you provided gets captured in the &lt;SPAN class="s2"&gt;_rescued_data&lt;/SPAN&gt; column as JSON.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StructType

input_path      = "s3://.../landing/json"
bronze_table    = "raw.bronze_events"
checkpoint_path = "s3://.../chk/autoloader/json"

schema = StructType([
    # Define your expected nested JSON schema here
])

df = (
    spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "json")
        .option("cloudFiles.schemaLocation", checkpoint_path + "/schema")
        .option("cloudFiles.schemaEvolutionMode", "rescue")
        .schema(schema)
        .load(input_path)
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;Since you are working with complex nested JSON, I would also strongly consider &lt;SPAN class="s1"&gt;cloudFiles.schemaHints&lt;/SPAN&gt; for any critical nested fields. On deeply nested structures, first-pass inference can get a little unpredictable, and schema hints help you lock down the parts that matter most.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;STRONG&gt;Step 2: Detect schema drift and write it to a Delta table&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="p1"&gt;From there, use &lt;SPAN class="s1"&gt;foreachBatch&lt;/SPAN&gt; to inspect each micro-batch for rescued rows. If &lt;SPAN class="s1"&gt;_rescued_data&lt;/SPAN&gt; is populated, that is effectively your drift signal. You can log those records to a dedicated Delta table with a timestamp and file metadata so you have a proper audit trail.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.functions import current_timestamp, col

drift_table = "quality.schema_drift_events"

def record_drift(batch_df, batch_id):
    # First write the full batch to Bronze
    batch_df.write.format("delta").mode("append").saveAsTable(bronze_table)

    # Then isolate drifted rows
    drift_df = (
        batch_df
          .filter(col("_rescued_data").isNotNull())
          .select(
              current_timestamp().alias("detected_at"),
              col("_metadata.file_name").alias("file_name"),
              col("_rescued_data")
          )
    )

    if drift_df.count() &amp;gt; 0:
        drift_df.write.format("delta").mode("append").saveAsTable(drift_table)

        raise Exception(
            f"Schema drift detected in {drift_df.count()} records. "
            f"See table {drift_table} for full details."
        )

(
    df.writeStream
      .foreachBatch(record_drift)
      .option("checkpointLocation", checkpoint_path + "/stream")
      .trigger(availableNow=True)
      .start()
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;That intentional &lt;SPAN class="s1"&gt;raise Exception&lt;/SPAN&gt; is doing two useful things at once:&lt;/P&gt;
&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;It persists the drift details to a Delta table&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;It forces the Workflow task to fail, which becomes the trigger for notification&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="p1"&gt;&lt;STRONG&gt;Step 3: Let Databricks Workflows handle the email alert&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="p1"&gt;Once the task fails, Databricks Workflows can take over and send the notification. You do not get total control over the email body, but in this case that is usually fine.&lt;/P&gt;
&lt;P class="p1"&gt;The failure email and the job run details will include the exception message, something like:&lt;/P&gt;
&lt;P class="p1"&gt;“Schema drift detected in N records. See table quality.schema_drift_events for full details.”&lt;/P&gt;
&lt;P class="p1"&gt;That gives the team the signal immediately, while the actual detail — what drifted, from which file, and when — is preserved in the Delta log table for investigation.&lt;/P&gt;
&lt;P class="p1"&gt;So the setup becomes pretty straightforward:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;build the ingestion as a Workflow task&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;configure email notifications on task failure&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;point the alerts to the appropriate DL or owner group&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;A few practical callouts&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;The &lt;SPAN class="s1"&gt;.count()&lt;/SPAN&gt; inside &lt;SPAN class="s1"&gt;foreachBatch&lt;/SPAN&gt; is perfectly reasonable at moderate scale. If you are dealing with very large micro-batches, you may want to log first and let a lightweight downstream step determine whether new drift records were written.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;If later on you decide you want to accept new columns automatically but still catch type or precision mismatches, then &lt;SPAN class="s1"&gt;schemaEvolutionMode = "addNewColumns"&lt;/SPAN&gt; plus a rescued data column is worth looking at. That is a different operating model, but sometimes the right one as pipelines mature.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;_rescued_data&lt;/SPAN&gt; works well for capturing drift, but deeply nested structures can still produce edge cases. That is another reason I like pairing rescue mode with a well-defined schema and targeted schema hints instead of relying too heavily on inference.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;Hope that helps. Let us know how it goes.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 10:27:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150667#M712</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-03-12T10:27:45Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow Notification: Pass/Failed –Schema Evolution/Rescue Mode Triggered for complex  json fil</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150692#M713</link>
      <description>&lt;P&gt;Thank for quick response . let me implement the code .&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 13:26:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/150692#M713</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-03-12T13:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow Notification: Pass/Failed –Schema Evolution/Rescue Mode Triggered for complex  json fil</title>
      <link>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/154053#M746</link>
      <description>&lt;P&gt;If fields are added in the JSON file from the source, the nested field names are captured in the rescue column. However, we do not get any indication of operation types such as addition, deletion, or data type changes. Attribute type changes and attribute deletions are not captured, and in these two scenarios, no values are stored in the rescue column.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 09:24:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/databricks-free-edition-help/workflow-notification-pass-failed-schema-evolution-rescue-mode/m-p/154053#M746</guid>
      <dc:creator>SantiNath_Dey</dc:creator>
      <dc:date>2026-04-10T09:24:59Z</dc:date>
    </item>
  </channel>
</rss>

