How to disable Delta Lake log compaction (.compacted.json files)

mohit7
New Contributor

Hi,

How can we disable the automatic creation of .compacted.json files in Delta Lake _delta_log folders?

Recently, some of our log files less than 30 days old were removed due to this compaction, affecting our ability to time travel.

We found spark.databricks.delta.deltaLog.minorCompaction.useForReads for the read side, but need to disable the write side.

Is there a Spark config or table property to disable this?

Kirankumarbs
Contributor

Hi @mohit7 ,

A couple of questions to answer you better!

- Are you using UC managed storage?
- Can you check delta.logRetentionDuration, which defaults to 30days!(This would be the main reason why you are not able to time travel with your checkpoints!)
- VACUUM retention, which defaults to 7 days(controls removal of data files which are no longer referenced)

because log compaction is an internal Delta/Databricks mechanism which is distcint from data file compaction and isn't exposed the way auto-compact/optimzed writes are!

K_Anudeep
Databricks Employee
Databricks Employee

Hey @mohit7 ,

Delta log file cleanup is delta-managed, and we dont have control over disabling it. What we can only do is to increase the retention duration to a higher value, say,> 30 days, so that you can time-travel. 

Also, even if the delta log files and compacted log files are removed, you can still time-travel if you have a checkpoint file created.

Anudeep

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @mohit7,

There are two separate mechanisms at play here, and it helps to distinguish between them because log compaction itself does not remove your original commit JSON files.

WHAT LOG COMPACTION DOES

Log compaction creates additional .compacted.json files in the _delta_log directory. These files aggregate actions from a range of commit versions into a single file to speed up snapshot construction. Per the Delta Lake protocol, these are purely supplemental: the original numbered JSON commit files (e.g., 00000000000000000004.json) are not deleted by compaction itself. Readers can optionally use .compacted.json files instead of reading each individual commit, but the originals remain.

WHAT REMOVES LOG FILES (METADATA CLEANUP)

The actual removal of old JSON commit files is handled by metadata cleanup, which runs automatically after checkpointing. This process is governed by the delta.logRetentionDuration table property (default: 30 days). When metadata cleanup runs, it deletes commit JSON files, checkpoint files, and log compaction files older than the retention threshold.

If you are seeing log files disappear before you expect, the root cause is likely metadata cleanup, not log compaction.

DISABLING LOG COMPACTION ON THE WRITE SIDE

To disable the creation of .compacted.json files, you can set the following Spark configuration:

spark.conf.set("spark.databricks.delta.deltaLog.minorCompaction.useForWrites", "false")

You already found the read-side config:

spark.conf.set("spark.databricks.delta.deltaLog.minorCompaction.useForReads", "false")

Setting useForWrites to false prevents new .compacted.json files from being created during write operations. Setting useForReads to false tells the reader to ignore existing .compacted.json files and read the individual commit files instead.

You can set these at the cluster level in your Spark configuration, or at session level in a notebook.

PRESERVING LOG FILES FOR LONGER TIME TRAVEL

If your goal is to retain the ability to perform time travel further back, the more direct approach is to increase the log retention duration on the table:

ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 90 days');

This tells Delta to keep log entries for 90 days (adjust as needed) before metadata cleanup removes them. Keep in mind that you also need your data file retention to support the time travel window:

ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 90 days');

Note: as of Databricks Runtime 18.0 and above, logRetentionDuration must be greater than or equal to deletedFileRetentionDuration.

SUMMARY

1. To stop creating .compacted.json files: set spark.databricks.delta.deltaLog.minorCompaction.useForWrites to false
2. To stop reading .compacted.json files: set spark.databricks.delta.deltaLog.minorCompaction.useForReads to false
3. To preserve log files longer for time travel: increase delta.logRetentionDuration on the table
4. To preserve data files longer for time travel: increase delta.deletedFileRetentionDuration to match

For more detail on log retention and time travel:
https://docs.databricks.com/en/delta/history.html

For the Delta Lake protocol specification on log compaction:
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#log-compaction-files

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.