<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Understanding file retention with Vacuum in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/understanding-file-retention-with-vacuum/m-p/25822#M18023</link>
    <description>&lt;P&gt;I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the table location".&lt;/P&gt;</description>
    <pubDate>Mon, 07 Jun 2021 21:14:43 GMT</pubDate>
    <dc:creator>User16783853906</dc:creator>
    <dc:date>2021-06-07T21:14:43Z</dc:date>
    <item>
      <title>Understanding file retention with Vacuum</title>
      <link>https://community.databricks.com/t5/data-engineering/understanding-file-retention-with-vacuum/m-p/25822#M18023</link>
      <description>&lt;P&gt;I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the table location".&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2021 21:14:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/understanding-file-retention-with-vacuum/m-p/25822#M18023</guid>
      <dc:creator>User16783853906</dc:creator>
      <dc:date>2021-06-07T21:14:43Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding file retention with Vacuum</title>
      <link>https://community.databricks.com/t5/data-engineering/understanding-file-retention-with-vacuum/m-p/25823#M18024</link>
      <description>&lt;P&gt;Hello @Venkatesh Kottapalli​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;VACUUM&amp;nbsp;removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.&amp;nbsp;&lt;/P&gt;&lt;P&gt;VACUUM&amp;nbsp;will skip all directories that begin with an underscore (_), which includes the&amp;nbsp;_delta_log.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;VACUUM. Default interval is 1 week.&lt;/B&gt; When you drop delta table or delete the data files, they are deleted from underlying _delta_log (like a hive metastore), but not from the actual file system. They get deleted from file system only when you run vacuum command, or when the retention period expires. This property determines how long the data files are retained after they get deleted.&lt;/P&gt;&lt;P&gt;%sql&lt;/P&gt;&lt;P&gt;ALTER TABLE&amp;nbsp;table-name&amp;nbsp;&lt;/P&gt;&lt;P&gt;SET TBLPROPERTIES ('delta.deletedFileRetentionDuration = '7 days',)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the&amp;nbsp;vacuum&amp;nbsp;command on the table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ref:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/vacuum.html" alt="https://docs.databricks.com/delta/vacuum.html" target="_blank"&gt;https://docs.databricks.com/delta/vacuum.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" alt="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" target="_blank"&gt;https://docs.databricks.com/sql/language-manual/delta-vacuum.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Feb 2023 03:46:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/understanding-file-retention-with-vacuum/m-p/25823#M18024</guid>
      <dc:creator>Priyanka_Biswas</dc:creator>
      <dc:date>2023-02-01T03:46:06Z</dc:date>
    </item>
  </channel>
</rss>

