cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Clean up _delta_log files

elgeo
Valued Contributor II

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:

SET spark.databricks.delta.retentionDurationCheck.enabled = false;

ALTER TABLE table_name

SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 minutes', 'delta.deletedFileRetentionDuration'='interval 1 minutes');

VACUUM table_name RETAIN 0 HOURS

We understand that each time a checkpoint is written, Databricks automatically cleans up log entries older than the specified retention interval. However, after new checkpoints and commits, all the log files are still there.

Could you please help? Just to mention that it is about tables where we don't need any time travel.

7 REPLIES 7

Brad
Contributor II

Hi, have this been fixed later on? We have seen similar issues. Thanks.

filipniziol
Esteemed Contributor

Hi @Brad , @elgeo ,

1. Regarding VACUUM it does not remove log files per documentation:
filipniziol_0-1728815753680.png

2. Setting 1 minute as delta.logRetentionDuration is way to low and may not work.

The default is 30 days and there is safety check that prevents setting it below 7 days. More on this in this topic

radothede
Valued Contributor II

That's right, just a small notice: the default threshold for retention period is 7 days.

filipniziol
Esteemed Contributor

Hi @radothede ,

The default for delta.logRetentionDuration is 30 days as per documentation:

filipniziol_1-1728885273433.png

 

 

radothede
Valued Contributor II

@filipniziol 

We are both right, but to be specific, I was referring VACUUM command - so effectively, if You run it on a table, by default, the VACUUM command will delete data files older than 7 days from the storage that are no longer referenced by the delta table's transaction log.

So, to make it clear:

delta.deletedFileRetentionDuration - default 7 days, deletes data older than specified retention period - triggered by VACUUM command;

delta.logRetentionDuration - default 30 days, removes logs older than retention period while overwriting the checkpoint file - build-in mechanism, does not need VACUUM;

Brad
Contributor II

Awesome, thanks for response.

michaeljac1986
Visitor

What youโ€™re seeing is expected behavior โ€” the _delta_log folder always keeps a history of JSON commit files, checkpoint files, and CRCs. Even if you lower delta.logRetentionDuration and run VACUUM, cleanup wonโ€™t happen immediately. A couple of points to note:

  • The property delta.logRetentionDuration controls how long log history is kept for time travel, but actual cleanup only happens when a new checkpoint is written and retention thresholds are met.

  • Setting it to something like 1 minute will disable time travel almost immediately, but you still need to wait for the next compaction/checkpoint cycle to actually drop files.

  • VACUUM only removes data files, not log files โ€” so it wonโ€™t reduce _delta_log size on its own.

If you really donโ€™t need any history/time travel, the supported approach is to:

  1. Set spark.databricks.delta.retentionDurationCheck.enabled = false.

  2. Use a very small delta.logRetentionDuration (like interval 1 minute).

  3. Trigger a few commits (inserts/updates) so new checkpoints are written.

  4. Delta will then automatically prune older JSON and CRC files beyond the retention window.

Also note that the _delta_log folder will never be completely empty โ€” at least the most recent checkpoint plus a few commit files are always retained.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now