cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

Shankar
New Contributor III

I am trying to learn more about Vacuum operation and came across the two properties: 

  1. delta.deletedFileRetentionDuration
  2. delta.logRetentionDuration

So, let's say I have a delta table where few records/files have been deleted. The delta.deletedFileRetentionDuration has been set to default (7 days). delta.logRetentionDuration is set to default (30 days). 

What would happen if I run a vacuum against the table with interval 200 days? Following are some of the questions I have. I am a beginner and so, kindly correct if my understanding of the concept is wrong. 

  1. Will the deleted file be completely cleaned-up from storage only after 207 days (retention being 7 and vacuum interval 200 days)? 
  2. As the logRetentionDuration is set to only 30 days, from the 31st day I can neither see what delete transaction has happened on 1st day? and I would not be able to traverse back to the file deleted on day 1?
  3. If I have vacuum interval of 200 days, then ideally, I have to set the logRetentionDuration and deleteFileRetentionDuration also to 200 days? 

Thank you.

 

2 REPLIES 2

dasiekr
New Contributor II

No answers for those question?

I also find it not clear enough to understand this process of underlying parquet files retention.

Can someone help here?

SubashDev
New Contributor II
  • Will the deleted file be completely cleaned-up from storage only after 207 days (retention being 7 and vacuum interval 200 days)? 
    As the default retention period is 7 days, there will not be any files older than 7 days, unless the retention period is explicitly set to a longer period. Performing VACUUM for 200 days, in general, tries to delete older files present in the last 200 days only.
  • As the logRetentionDuration is set to only 30 days, from the 31st day I can neither see what delete transaction has happened on 1st day? and I would not be able to traverse back to the file deleted on day 1?
    Yes, this is correct. If needed logRetentionDuration can be set for a longer period. This will hold only the logs for this period and not the deleted files.
  • If I have vacuum interval of 200 days, then ideally, I have to set the logRetentionDuration and deleteFileRetentionDuration also to 200 days? 
    By default, delta handles the deletion of older files based on the deleteFileRetentionDuration. For VACUUM to delete data of last 200 days (and to not want delta by default delete the data older than 7 days) deleteFileRetentionDuration can be set to 200 days and the same can be applied for logRetentionDuration to preserve the logs.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group