Hi @mohit7,
There are two separate mechanisms at play here, and it helps to distinguish between them because log compaction itself does not remove your original commit JSON files.
WHAT LOG COMPACTION DOES
Log compaction creates additional .compacted.json files in the _delta_log directory. These files aggregate actions from a range of commit versions into a single file to speed up snapshot construction. Per the Delta Lake protocol, these are purely supplemental: the original numbered JSON commit files (e.g., 00000000000000000004.json) are not deleted by compaction itself. Readers can optionally use .compacted.json files instead of reading each individual commit, but the originals remain.
WHAT REMOVES LOG FILES (METADATA CLEANUP)
The actual removal of old JSON commit files is handled by metadata cleanup, which runs automatically after checkpointing. This process is governed by the delta.logRetentionDuration table property (default: 30 days). When metadata cleanup runs, it deletes commit JSON files, checkpoint files, and log compaction files older than the retention threshold.
If you are seeing log files disappear before you expect, the root cause is likely metadata cleanup, not log compaction.
DISABLING LOG COMPACTION ON THE WRITE SIDE
To disable the creation of .compacted.json files, you can set the following Spark configuration:
spark.conf.set("spark.databricks.delta.deltaLog.minorCompaction.useForWrites", "false")
You already found the read-side config:
spark.conf.set("spark.databricks.delta.deltaLog.minorCompaction.useForReads", "false")
Setting useForWrites to false prevents new .compacted.json files from being created during write operations. Setting useForReads to false tells the reader to ignore existing .compacted.json files and read the individual commit files instead.
You can set these at the cluster level in your Spark configuration, or at session level in a notebook.
PRESERVING LOG FILES FOR LONGER TIME TRAVEL
If your goal is to retain the ability to perform time travel further back, the more direct approach is to increase the log retention duration on the table:
ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 90 days');
This tells Delta to keep log entries for 90 days (adjust as needed) before metadata cleanup removes them. Keep in mind that you also need your data file retention to support the time travel window:
ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 90 days');
Note: as of Databricks Runtime 18.0 and above, logRetentionDuration must be greater than or equal to deletedFileRetentionDuration.
SUMMARY
1. To stop creating .compacted.json files: set spark.databricks.delta.deltaLog.minorCompaction.useForWrites to false
2. To stop reading .compacted.json files: set spark.databricks.delta.deltaLog.minorCompaction.useForReads to false
3. To preserve log files longer for time travel: increase delta.logRetentionDuration on the table
4. To preserve data files longer for time travel: increase delta.deletedFileRetentionDuration to match
For more detail on log retention and time travel:
https://docs.databricks.com/en/delta/history.html
For the Delta Lake protocol specification on log compaction:
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#log-compaction-files
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.