Understanding file retention with Vacuum
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2021 02:14 PM
I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the table location".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2023 07:46 PM
Hello @Venkatesh Kottapalli
VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.
VACUUM will skip all directories that begin with an underscore (_), which includes the _delta_log.
VACUUM. Default interval is 1 week. When you drop delta table or delete the data files, they are deleted from underlying _delta_log (like a hive metastore), but not from the actual file system. They get deleted from file system only when you run vacuum command, or when the retention period expires. This property determines how long the data files are retained after they get deleted.
%sql
ALTER TABLE table-name
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration = '7 days',)
You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table.
Ref:
https://docs.databricks.com/delta/vacuum.html
https://docs.databricks.com/sql/language-manual/delta-vacuum.html