<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/when-running-a-merge-if-records-from-the-table-are-deleted-are/m-p/23898#M16580</link>
    <description>&lt;P&gt;I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?&lt;/P&gt;</description>
    <pubDate>Wed, 16 Jun 2021 16:38:39 GMT</pubDate>
    <dc:creator>User16826992666</dc:creator>
    <dc:date>2021-06-16T16:38:39Z</dc:date>
    <item>
      <title>When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-running-a-merge-if-records-from-the-table-are-deleted-are/m-p/23898#M16580</link>
      <description>&lt;P&gt;I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2021 16:38:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-running-a-merge-if-records-from-the-table-are-deleted-are/m-p/23898#M16580</guid>
      <dc:creator>User16826992666</dc:creator>
      <dc:date>2021-06-16T16:38:39Z</dc:date>
    </item>
    <item>
      <title>Re: When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-running-a-merge-if-records-from-the-table-are-deleted-are/m-p/23899#M16581</link>
      <description>&lt;P&gt;Delta implements MERGE by physically rewriting existing files. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is implemented &amp;nbsp;in two steps.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Perform an&amp;nbsp;&lt;B&gt;inner join&lt;/B&gt;&amp;nbsp;between the target table and source table to select all files that have matches.&lt;/LI&gt;&lt;LI&gt;Perform an&amp;nbsp;&lt;B&gt;outer join&lt;/B&gt;&amp;nbsp;between the selected files in the target and source tables and write out the updated/deleted/inserted data.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Here is an article that explain the DML internals of delta - &lt;A href="https://databricks.com/blog/2020/09/29/diving-into-delta-lake-dml-internals-update-delete-merge.html" target="test_blank"&gt;https://databricks.com/blog/2020/09/29/diving-into-delta-lake-dml-internals-update-delete-merge.html&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The old files that are not referenced anymore would get garbage collected eventually when you run &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-vacuum.html" alt="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-vacuum.html" target="_blank"&gt;VACUUM&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 20:36:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-running-a-merge-if-records-from-the-table-are-deleted-are/m-p/23899#M16581</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-18T20:36:06Z</dc:date>
    </item>
  </channel>
</rss>

