10-31-2022 05:46 AM
Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
ALTER TABLE table_name
SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 minutes', 'delta.deletedFileRetentionDuration'='interval 1 minutes');
VACUUM table_name RETAIN 0 HOURS
We understand that each time a checkpoint is written, Databricks automatically cleans up log entries older than the specified retention interval. However, after new checkpoints and commits, all the log files are still there.
Could you please help? Just to mention that it is about tables where we don't need any time travel.
10-11-2024 02:47 PM
Hi, have this been fixed later on? We have seen similar issues. Thanks.
10-13-2024 03:42 AM
Hi @Brad , @elgeo ,
1. Regarding VACUUM it does not remove log files per documentation:
2. Setting 1 minute as delta.logRetentionDuration is way to low and may not work.
The default is 30 days and there is safety check that prevents setting it below 7 days. More on this in this topic
10-13-2024 11:56 AM
That's right, just a small notice: the default threshold for retention period is 7 days.
10-13-2024 10:55 PM - edited 10-13-2024 10:55 PM
10-13-2024 11:46 PM - edited 10-13-2024 11:51 PM
We are both right, but to be specific, I was referring VACUUM command - so effectively, if You run it on a table, by default, the VACUUM command will delete data files older than 7 days from the storage that are no longer referenced by the delta table's transaction log.
So, to make it clear:
delta.deletedFileRetentionDuration - default 7 days, deletes data older than specified retention period - triggered by VACUUM command;
delta.logRetentionDuration - default 30 days, removes logs older than retention period while overwriting the checkpoint file - build-in mechanism, does not need VACUUM;
10-13-2024 04:39 PM
Awesome, thanks for response.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group