cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Clean up _delta_log files

elgeo
Valued Contributor II

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:

SET spark.databricks.delta.retentionDurationCheck.enabled = false;

ALTER TABLE table_name

SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 minutes', 'delta.deletedFileRetentionDuration'='interval 1 minutes');

VACUUM table_name RETAIN 0 HOURS

We understand that each time a checkpoint is written, Databricks automatically cleans up log entries older than the specified retention interval. However, after new checkpoints and commits, all the log files are still there.

Could you please help? Just to mention that it is about tables where we don't need any time travel.

6 REPLIES 6

Brad
Contributor II

Hi, have this been fixed later on? We have seen similar issues. Thanks.

filipniziol
Contributor III

Hi @Brad , @elgeo ,

1. Regarding VACUUM it does not remove log files per documentation:
filipniziol_0-1728815753680.png

2. Setting 1 minute as delta.logRetentionDuration is way to low and may not work.

The default is 30 days and there is safety check that prevents setting it below 7 days. More on this in this topic

That's right, just a small notice: the default threshold for retention period is 7 days.

Hi @radothede ,

The default for delta.logRetentionDuration is 30 days as per documentation:

filipniziol_1-1728885273433.png

 

 

@filipniziol 

We are both right, but to be specific, I was referring VACUUM command - so effectively, if You run it on a table, by default, the VACUUM command will delete data files older than 7 days from the storage that are no longer referenced by the delta table's transaction log.

So, to make it clear:

delta.deletedFileRetentionDuration - default 7 days, deletes data older than specified retention period - triggered by VACUUM command;

delta.logRetentionDuration - default 30 days, removes logs older than retention period while overwriting the checkpoint file - build-in mechanism, does not need VACUUM;

Brad
Contributor II

Awesome, thanks for response.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group