cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

why is recommended default setting is delta deleted file duration to be 7 days?

Avvar2022
Contributor

Due to frequent updates on table our backend storage size is growing a lot, even though we have vacuum and optimize scheduled unable to clean up files 7 days or less.

Current settings is: delta.logRetentionDuration="interval 7 days" and deleted files setting is default (which is 7 days)

Based on the documentation, it is a recommended/default setting to keep  delta.deletedFileRetentionDuration="interval 7 days" and delta.logRetentionDuration="interval 7 days"but am not able to find a good documentation as why it is recommended to keep 7 days. what are the drawbacks if this setting is set to 24 hours

My scenario:

My table updates every 2-3 mins, I don't need delta log more than one day. i would like to run vacuum & optimize daily and clean up and consolidate files. 

Appreciate if you can share documentation or rational for having 7 days.

2 REPLIES 2

szymon_dybczak
Contributor III

Hi @Avvar2022 ,

I guess this recommendation is about cases where there was some kind of bug and you didn't notice it immediately, but after couple of days. Then you can restore table to previous version. So it's reasonable in such cases to have rcan retention period longer than one day 

But certainly, you can decrease log retention period if in your use case it's causing problems. It's totally valid option.

Witold
Honored Contributor

It's not really a recommendation per se, it's basically a default, which you simply need. And yes, it's supposed to be adapted to your specific needs.

In this case:


I don't need delta log more than one day.


If you're fine that you won't be able to rollback your data to previous days, then set it to one day, it's totally fine. You can even disable it completely, if you don't need this feature at all.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group