<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT Pipeline with unknown deleted source data in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135774#M50430</link>
    <description>&lt;DIV&gt;Hello.. I need help.&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;So the context is :&amp;nbsp;&lt;DIV&gt;- ERP data for company in my group is stored in sql tables&amp;nbsp;&lt;DIV&gt;- Currently, once per day we copy the last 2 months of data (creation date) from each table into our datalake landing zone (we can however do full copies if needed)&lt;DIV&gt;- A databricks DLT pipeline then ingests this data : append into bronze using readstream autoloader and then apply_changes upsert scd type 1 into silver&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;This was working fine until we realised there were rows being deleted from the source data and no flag and no reference table - i.e. the only way for us to know that a line has been deleted is to compare with the data we have.&amp;nbsp;&lt;DIV&gt;So we want to update the silver with this information by adding a is_deleted = True flag to the rows that have been deleted.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;My basic plan is based around do a weekly catch up. Every saturday we collect a full copy of all the tables.&amp;nbsp;&lt;DIV&gt;There have then been two main attemps.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;1. Compare this latest data ingestion with the current silver table&amp;nbsp;&lt;DIV&gt;-&amp;gt; Immediate no go : Cannot read silver table that is the target table of the current pipeline.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;2. Compare the latest data ingestion with the previous bronze data.&amp;nbsp;&lt;DIV&gt;- At start of silver pipeline -&amp;gt; load bronze -&amp;gt; split into latest ingestion and the rest (**)&lt;DIV&gt;- If a line is found in previous data but not in the latest full ingestion = it has been deleted&lt;DIV&gt;- I added a parameter to the function such that if delete_missing_rows&amp;nbsp;&amp;nbsp;(signifying a full catch up)= True, it does this comparison, if= False, it does like before and just upserts the last 2 months of data.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;Then i started hitting problems.&amp;nbsp;&lt;DIV&gt;My first attempt at doing a full catch up where delete_missing_rows= True failed because the pipeline claimed that there was a schema change because there was the column "__enzyme__row__id" in the target silver table and not in the&amp;nbsp;&lt;DIV&gt;To bypass this i do a full refresh.&amp;nbsp;&lt;DIV&gt;This worked and i was really happy. My silver table now has flagged rows that had been deleted from the source data.&amp;nbsp;&lt;DIV&gt;However a new problem came when i tested going back to doing a normal run (the full catch up is only once per week, the rest of the week the table is upserted into with 2 months of data).&lt;DIV&gt;When going back to a normal update i get the problem seen in the attachment image.png (which from what i can understand comes from the fact that ther source table which is the non temporary staging dlt.table) has changed - since there is not the same comparison and addition of deleted = true flags&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;I have attached the function.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Hopefully you have enough info.&lt;/DIV&gt;&lt;DIV&gt;Thanks in advance !&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 22 Oct 2025 21:42:21 GMT</pubDate>
    <dc:creator>janglais</dc:creator>
    <dc:date>2025-10-22T21:42:21Z</dc:date>
    <item>
      <title>DLT Pipeline with unknown deleted source data</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135774#M50430</link>
      <description>&lt;DIV&gt;Hello.. I need help.&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;So the context is :&amp;nbsp;&lt;DIV&gt;- ERP data for company in my group is stored in sql tables&amp;nbsp;&lt;DIV&gt;- Currently, once per day we copy the last 2 months of data (creation date) from each table into our datalake landing zone (we can however do full copies if needed)&lt;DIV&gt;- A databricks DLT pipeline then ingests this data : append into bronze using readstream autoloader and then apply_changes upsert scd type 1 into silver&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;This was working fine until we realised there were rows being deleted from the source data and no flag and no reference table - i.e. the only way for us to know that a line has been deleted is to compare with the data we have.&amp;nbsp;&lt;DIV&gt;So we want to update the silver with this information by adding a is_deleted = True flag to the rows that have been deleted.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;My basic plan is based around do a weekly catch up. Every saturday we collect a full copy of all the tables.&amp;nbsp;&lt;DIV&gt;There have then been two main attemps.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;1. Compare this latest data ingestion with the current silver table&amp;nbsp;&lt;DIV&gt;-&amp;gt; Immediate no go : Cannot read silver table that is the target table of the current pipeline.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;2. Compare the latest data ingestion with the previous bronze data.&amp;nbsp;&lt;DIV&gt;- At start of silver pipeline -&amp;gt; load bronze -&amp;gt; split into latest ingestion and the rest (**)&lt;DIV&gt;- If a line is found in previous data but not in the latest full ingestion = it has been deleted&lt;DIV&gt;- I added a parameter to the function such that if delete_missing_rows&amp;nbsp;&amp;nbsp;(signifying a full catch up)= True, it does this comparison, if= False, it does like before and just upserts the last 2 months of data.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;Then i started hitting problems.&amp;nbsp;&lt;DIV&gt;My first attempt at doing a full catch up where delete_missing_rows= True failed because the pipeline claimed that there was a schema change because there was the column "__enzyme__row__id" in the target silver table and not in the&amp;nbsp;&lt;DIV&gt;To bypass this i do a full refresh.&amp;nbsp;&lt;DIV&gt;This worked and i was really happy. My silver table now has flagged rows that had been deleted from the source data.&amp;nbsp;&lt;DIV&gt;However a new problem came when i tested going back to doing a normal run (the full catch up is only once per week, the rest of the week the table is upserted into with 2 months of data).&lt;DIV&gt;When going back to a normal update i get the problem seen in the attachment image.png (which from what i can understand comes from the fact that ther source table which is the non temporary staging dlt.table) has changed - since there is not the same comparison and addition of deleted = true flags&lt;DIV&gt;&amp;nbsp;&lt;DIV&gt;I have attached the function.&amp;nbsp;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Hopefully you have enough info.&lt;/DIV&gt;&lt;DIV&gt;Thanks in advance !&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 22 Oct 2025 21:42:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135774#M50430</guid>
      <dc:creator>janglais</dc:creator>
      <dc:date>2025-10-22T21:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline with unknown deleted source data</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135849#M50444</link>
      <description>&lt;P&gt;Your solution #1 is very frustrating to me as well, for a number of reasons.&amp;nbsp; Simply put, we have to be able to compare incoming data to target data for normal ETL operations.&amp;nbsp;&lt;/P&gt;&lt;P&gt;One way around this is to create a view of your target silver table, outside of your pipeline (this part is key), and compare against that view.&amp;nbsp; The pipeline will give you a warning telling you that you're being naughty, but it will let you do the comparison anyways.&amp;nbsp; We have had to do this and it does work, although it's clearly not a desirable solution to have to maintain this set of views, but it would allow you to perform the deletion check.&lt;/P&gt;&lt;P&gt;Example: `create or replace view silver.vw_target_table as select * from silver.target_table;`&lt;/P&gt;&lt;P&gt;Someone else might be able to chime in with a more elegant solution, but that's what I've got for you at least!&lt;/P&gt;</description>
      <pubDate>Thu, 23 Oct 2025 14:25:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135849#M50444</guid>
      <dc:creator>madams</dc:creator>
      <dc:date>2025-10-23T14:25:19Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline with unknown deleted source data</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135874#M50450</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The issue you're experiencing is a common challenge in Delta Live Tables (DLT) when implementing mixed refresh patterns (weekly full refresh + daily incremental updates) with schema evolution. The "__enzyme__row__id" column and schema mismatch errors are indicators of DLT's internal tracking mechanisms conflicting with your conditional logic.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Root Cause Analysis&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The core problem stems from DLT's expectation of consistent schema across pipeline runs. When you switch between full catch-up mode (with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;is_deleted&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;flag logic) and normal incremental mode, you're essentially creating two different schemas:​&lt;/P&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Full catch-up mode&lt;/STRONG&gt;: Includes additional columns and logic for deleted record detection&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Normal mode&lt;/STRONG&gt;: Standard upsert without deleted record tracking&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The "__enzyme__row__id" column is DLT's internal row tracking mechanism , and schema mismatches occur when DLT expects consistent column structures between runs.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Recommended Solutions&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Solution 1: Consistent Schema Approach (Recommended)&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Always include the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;is_deleted&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;column in your silver table schema, regardless of the refresh type:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;table&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;silver_table&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; dlt&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;read&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"bronze_table"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    
    &lt;SPAN class="token token"&gt;# Always add is_deleted column, defaulting to False for normal runs&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;if&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;not&lt;/SPAN&gt; spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;get&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"delete_missing_rows"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"false"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;lower&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token operator"&gt;==&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        df &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;withColumn&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"is_deleted"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; lit&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token boolean"&gt;False&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;else&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        &lt;SPAN class="token token"&gt;# Your existing full catch-up logic here&lt;/SPAN&gt;
        &lt;SPAN class="token token"&gt;pass&lt;/SPAN&gt;
    
    &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; df
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Solution 2: Separate Tables Strategy&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Create separate silver tables for different refresh patterns:&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;CODE&gt;silver_table_incremental&lt;/CODE&gt;: For daily updates&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;CODE&gt;silver_table_full&lt;/CODE&gt;: For weekly full refreshes with delete tracking&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Then use a view or downstream table to union them appropriately.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Solution 3: Schema Evolution Configuration&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Enable automatic schema evolution in your DLT pipeline configuration:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.databricks.delta.schema.autoMerge.enabled"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;However, use this cautiously as it can mask unintended schema changes.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Best Practices for Your Use Case&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Weekly Full Refresh Strategy&lt;/H2&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Use DLT's built-in full refresh capability&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;rather than custom logic​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Schedule separate pipelines&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for full vs incremental refreshes&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Leverage Change Data Feed (CDF)&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for deleted record tracking​&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Handling Deleted Records&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Consider implementing a more robust deleted record detection strategy:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;table&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;silver_table_with_deletes&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    current_data &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; dlt&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;read&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"bronze_latest"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    
    &lt;SPAN class="token token"&gt;if&lt;/SPAN&gt; is_full_refresh_run&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        &lt;SPAN class="token token"&gt;# Compare with previous state&lt;/SPAN&gt;
        previous_data &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; dlt&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;read&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"bronze_previous"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; 
        deleted_records &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; detect_deleted_records&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;previous_data&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; current_data&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
        &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; current_data&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;union&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;deleted_records&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;withColumn&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"is_deleted"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; lit&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token boolean"&gt;True&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;else&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
        &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; current_data&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;withColumn&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"is_deleted"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; lit&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token boolean"&gt;False&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Implementation Recommendations&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Option A: Redesign with Consistent Schema&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Always include&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;is_deleted&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;column in silver table&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Use pipeline parameters to control delete detection logic&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Maintain schema consistency across all runs&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Option B: Use DLT's Native Capabilities&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Leverage&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;apply_changes&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;apply_as_deletes&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;parameter​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Enable Change Data Feed on source tables&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Use SCD Type 1 for complete record removal&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Option C: Dual Pipeline Approach&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Separate pipeline for weekly full refresh with delete detection&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Daily incremental pipeline for standard upserts&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Downstream merge process to combine results&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Avoiding Full Refresh Requirements&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;To prevent the need for full refreshes due to schema changes:&lt;/P&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Define complete schema upfront&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;including optional columns&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Use schema evolution settings&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;appropriate for your use case​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Test schema changes&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in development before production deployment&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Consider using streaming tables&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;instead of materialized views for more flexibility​&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The most sustainable approach would be Option A (consistent schema) combined with DLT's native delete handling capabilities, as it maintains schema consistency while providing the flexibility you need for both refresh patterns.&lt;/P&gt;</description>
      <pubDate>Thu, 23 Oct 2025 17:47:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-with-unknown-deleted-source-data/m-p/135874#M50450</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-23T17:47:59Z</dc:date>
    </item>
  </channel>
</rss>

