Showing results for 
Search instead for 
Did you mean: 

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

New Contributor III

I am trying to learn more about Vacuum operation and came across the two properties: 

  1. delta.deletedFileRetentionDuration
  2. delta.logRetentionDuration

So, let's say I have a delta table where few records/files have been deleted. The delta.deletedFileRetentionDuration has been set to default (7 days). delta.logRetentionDuration is set to default (30 days). 

What would happen if I run a vacuum against the table with interval 200 days? Following are some of the questions I have. I am a beginner and so, kindly correct if my understanding of the concept is wrong. 

  1. Will the deleted file be completely cleaned-up from storage only after 207 days (retention being 7 and vacuum interval 200 days)? 
  2. As the logRetentionDuration is set to only 30 days, from the 31st day I can neither see what delete transaction has happened on 1st day? and I would not be able to traverse back to the file deleted on day 1?
  3. If I have vacuum interval of 200 days, then ideally, I have to set the logRetentionDuration and deleteFileRetentionDuration also to 200 days? 

Thank you.



New Contributor II

No answers for those question?

I also find it not clear enough to understand this process of underlying parquet files retention.

Can someone help here?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.