cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding file retention with Vacuum

User16783853906
Contributor III

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the table location".

1 REPLY 1

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hello @Venkatesh Kottapalli​ 

VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. 

VACUUM will skip all directories that begin with an underscore (_), which includes the _delta_log.

VACUUM. Default interval is 1 week. When you drop delta table or delete the data files, they are deleted from underlying _delta_log (like a hive metastore), but not from the actual file system. They get deleted from file system only when you run vacuum command, or when the retention period expires. This property determines how long the data files are retained after they get deleted.

%sql

ALTER TABLE table-name 

SET TBLPROPERTIES ('delta.deletedFileRetentionDuration = '7 days',)

You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table.

Ref:

https://docs.databricks.com/delta/vacuum.html

https://docs.databricks.com/sql/language-manual/delta-vacuum.html

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.