<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to deal with delete records from the source Files in DLT . in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/63772#M32343</link>
    <description>&lt;P&gt;Hi Manoj,&lt;/P&gt;&lt;P&gt;No. APPLY CHANGES does not delete the data from bonze if key/data is not present in source. It will delete it based on the value of some incoming field. Something like a status (="Delete") or such.&amp;nbsp;&lt;/P&gt;&lt;P&gt;If no status can be provided from source then you will need to execute the deletes once again in each layer. Make sure that you set the skipChangeCommits flag to true so the streams ignore any deletes and updates. Streaming is append only and hence does not expect any deletes or updates in source.&amp;nbsp; &lt;A href="https://docs.databricks.com/en/delta-live-tables/python-ref.html#configure-a-streaming-table-to-ignore-changes-in-a-source-streaming-table" target="_self"&gt;Link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;A common example is if you are clearing out old data from the source tables, You will need to do this for all layers. DLT will not do it automatically for you.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Mar 2024 08:04:24 GMT</pubDate>
    <dc:creator>Edthehead</dc:creator>
    <dc:date>2024-03-15T08:04:24Z</dc:date>
    <item>
      <title>How to deal with delete records from the source Files in DLT .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/53934#M29931</link>
      <description>&lt;P&gt;Can apply_changes feature deal with deleted records in incoming source Files?&lt;BR /&gt;By delete I mean record is being removed (Not a soft delete with Flag).&lt;BR /&gt;If not, how to automate with deleting records from Bronze Streaming table based on source Files.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Nov 2023 09:54:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/53934#M29931</guid>
      <dc:creator>ManojReddy</dc:creator>
      <dc:date>2023-11-27T09:54:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with delete records from the source Files in DLT .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/54199#M30012</link>
      <description>&lt;P&gt;&lt;SPAN&gt;APPLY CHANGES can do upserts but I have doubts regarding deleting records by key.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Does &lt;SPAN&gt;APPLY CHANGES can deletes record from bronze streaming if key is not present in source delta files.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Nov 2023 10:49:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/54199#M30012</guid>
      <dc:creator>ManojReddy</dc:creator>
      <dc:date>2023-11-29T10:49:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with delete records from the source Files in DLT .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/63772#M32343</link>
      <description>&lt;P&gt;Hi Manoj,&lt;/P&gt;&lt;P&gt;No. APPLY CHANGES does not delete the data from bonze if key/data is not present in source. It will delete it based on the value of some incoming field. Something like a status (="Delete") or such.&amp;nbsp;&lt;/P&gt;&lt;P&gt;If no status can be provided from source then you will need to execute the deletes once again in each layer. Make sure that you set the skipChangeCommits flag to true so the streams ignore any deletes and updates. Streaming is append only and hence does not expect any deletes or updates in source.&amp;nbsp; &lt;A href="https://docs.databricks.com/en/delta-live-tables/python-ref.html#configure-a-streaming-table-to-ignore-changes-in-a-source-streaming-table" target="_self"&gt;Link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;A common example is if you are clearing out old data from the source tables, You will need to do this for all layers. DLT will not do it automatically for you.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 08:04:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/63772#M32343</guid>
      <dc:creator>Edthehead</dc:creator>
      <dc:date>2024-03-15T08:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with delete records from the source Files in DLT .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/75461#M34965</link>
      <description>&lt;P&gt;Hi Manoj,&lt;/P&gt;&lt;P&gt;Did you get the solution or design change for this problem. We have 200K files on to S3 bucket and when there is change in upstream app we get new feed, feed name is fixed. On DLT we should have only new records from replaced file but in dlt we have previously added records from same file name. As we don't have any status indicator on deleted records as these are events from upstream,we are unable to do apply_changes also we can't do full refresh as we have almost 200K files on one file replacement this full refresh takes time.&lt;/P&gt;</description>
      <pubDate>Sun, 23 Jun 2024 00:42:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/75461#M34965</guid>
      <dc:creator>2vinodhkumar</dc:creator>
      <dc:date>2024-06-23T00:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with delete records from the source Files in DLT .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/75496#M34969</link>
      <description>&lt;P&gt;Hi Vinodh,&lt;/P&gt;&lt;P&gt;Seems like DLT cannot handle it on its own.&amp;nbsp;&lt;BR /&gt;I think of a solution which goes like this.&lt;/P&gt;&lt;P&gt;1) Maintain the copy of 200k files in a location (copied path). DLT should point to this copied path.&lt;/P&gt;&lt;P&gt;2)If there is any change in the incoming file. Run a process to insert the deleted records with status indicator as delete and copy this file over to copied path. &amp;nbsp;For inserting the deleted records you need to compare with the file in the copied path. These you can use DLT because you have status indicator.&lt;/P&gt;&lt;P&gt;Basically there should be a job with runs in certain interval(ex: 10mins) which tracks the changed file based on last update date and then compare with existing file in copied path to insert the deleted records with status indicator.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 23 Jun 2024 07:29:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-deal-with-delete-records-from-the-source-files-in-dlt/m-p/75496#M34969</guid>
      <dc:creator>ManojReddy</dc:creator>
      <dc:date>2024-06-23T07:29:59Z</dc:date>
    </item>
  </channel>
</rss>

