- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 01:15 AM
Hello Databricks Community,
I am experiencing an issue with Delta Lake where the _delta_log files are not being deleted automatically in GCS bucket, even though I have set the table properties to enable this behavior. Here is the configuration I used:
ALTER TABLE delta.`gs://sample-data` SET TBLPROPERTIES ( 'retentionDurationCheck.enabled'='false', 'delta.logRetentionDuration' = 'interval 1 days', 'delta.deletedFileRetentionDuration' = 'interval 1 days', 'delta.autoOptimize.optimizeWrite' = 'false', 'delta.autoOptimize.autoCompact' = 'true', 'delta.targetFileSize' = '1073741824' );
Despite these settings, the log files remain in the directory beyond the specified retention period. I understand that log files should be deleted automatically after checkpoint operations, and I have ensured that checkpoints are being created.
Could there be any specific reasons or additional configurations required for these settings to take effect? Is there a known issue with certain environments or configurations that might prevent the automatic deletion of Delta log files?
I appreciate any insights or suggestions from those who have encountered and resolved similar issues.
Hung Nguyen
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 05:09 AM
Hi, no worries @minhhung0507 .
Check and inspect if the _delta_log files in question still reference data files that fall within the retention period. Delta Lake retains logs for files that are still active or could be required for transactional consistency and time travel.
If you notice inconsistencies across different tables, it might be due to differences in how checkpoints are created or how the retention period is managed for each table, check the table properties and default values wherever these are not set. Ensure that the configurations for retention and checkpointing are consistent across all tables. [1]
Delta Lake's log files are deleted automatically and asynchronously after checkpoint operations. If this is not happening, there might be an issue with the cleanup mechanism itself
Try running the VACUUM
command on the Delta table to help in removing data files that are no longer referenced by the table. However, note that the VACUUM
command does not govern the deletion of log files. Log files are managed separately and are deleted after checkpoint operations. [1]
[1] https://docs.databricks.com/en/delta/history.html#configure-data-retention-for-time-travel-queries
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2024 08:25 AM
Thank you for sharing the details. A couple of key points to clarify and verify in this scenario:
-
How are you confirming that the
_delta_log
files should have been deleted? It’s important to verify that the retention period has indeed elapsed and that checkpoints have been created, as log file cleanup typically occurs after a checkpoint operation. -
Have you checked if the
_delta_log
files in question still reference data files that fall within the retention period? Delta Lake retains logs for files that are still active or could be required for transactional consistency and time travel.
These details will help narrow down whether the issue is with the cleanup mechanism or if the files are still required for data consistency. Let us know, and we’ll assist further!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 02:06 AM
Dear @VZLA,
I apologize for the delayed response due to some unforeseen circumstances.Regarding your questions:
- To confirm that the _delta_log files should have been deleted, I have been monitoring the retention period and ensuring that checkpoints have been created. However, I’ve noticed inconsistencies across different tables. In some cases, the cleanup mechanism seems to work as expected, while in others, it does not. This discrepancy is puzzling.
- I have checked the _delta_log files in question, and it appears that some still reference data files within the retention period. This leads to uncertainty about whether the logs are being retained for transactional consistency or if there is an issue with the cleanup process itself.
Additionally, I’ve observed that certain tables create checkpoints after 10 transaction in the delta_log, while others do not. I am unsure why this behavior differs among tables.I appreciate your assistance in narrowing down this issue, and I look forward to your guidance.
Hung Nguyen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 05:09 AM
Hi, no worries @minhhung0507 .
Check and inspect if the _delta_log files in question still reference data files that fall within the retention period. Delta Lake retains logs for files that are still active or could be required for transactional consistency and time travel.
If you notice inconsistencies across different tables, it might be due to differences in how checkpoints are created or how the retention period is managed for each table, check the table properties and default values wherever these are not set. Ensure that the configurations for retention and checkpointing are consistent across all tables. [1]
Delta Lake's log files are deleted automatically and asynchronously after checkpoint operations. If this is not happening, there might be an issue with the cleanup mechanism itself
Try running the VACUUM
command on the Delta table to help in removing data files that are no longer referenced by the table. However, note that the VACUUM
command does not govern the deletion of log files. Log files are managed separately and are deleted after checkpoint operations. [1]
[1] https://docs.databricks.com/en/delta/history.html#configure-data-retention-for-time-travel-queries
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 02:48 AM
Dear @VZLA ,
Thank you very much for your detailed and helpful response regarding the _delta_log files issue. Your expertise is greatly appreciated.
I've followed your advice and conducted a thorough check on several tables. Interestingly, I've discovered that despite having identical table properties, some tables automatically delete their delta_log files while others do not. This inconsistency is quite puzzling.
Given this observation, I believe it would be beneficial to continue monitoring this issue closely. Your insights have provided a solid foundation for further investigation, and I'll keep a keen eye on the behavior of these tables moving forward.
Once again, thank you for your time and valuable assistance. Your guidance has been instrumental in helping me understand and address this complex issue.
Hung Nguyen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 06:36 AM
Glad it helps, and agree to monitoring this behaviour closely. Should you need further assistance, please don't hesitate to reach out.

