<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clean up _delta_log files in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/24752#M17229</link>
    <description>&lt;P&gt;Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;ALTER TABLE table_name&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 minutes', 'delta.deletedFileRetentionDuration'='interval 1 minutes');&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;VACUUM &lt;I&gt;table_name&lt;/I&gt;&amp;nbsp;RETAIN 0 HOURS&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We understand that each time a checkpoint is written, Databricks automatically cleans up log entries older than the specified retention interval. However, after new checkpoints and commits, all the log files are still there. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you please help? Just to mention that it is about tables where we don't need any time travel. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 31 Oct 2022 12:46:17 GMT</pubDate>
    <dc:creator>elgeo</dc:creator>
    <dc:date>2022-10-31T12:46:17Z</dc:date>
    <item>
      <title>Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/24752#M17229</link>
      <description>&lt;P&gt;Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;ALTER TABLE table_name&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 minutes', 'delta.deletedFileRetentionDuration'='interval 1 minutes');&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;VACUUM &lt;I&gt;table_name&lt;/I&gt;&amp;nbsp;RETAIN 0 HOURS&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We understand that each time a checkpoint is written, Databricks automatically cleans up log entries older than the specified retention interval. However, after new checkpoints and commits, all the log files are still there. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you please help? Just to mention that it is about tables where we don't need any time travel. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2022 12:46:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/24752#M17229</guid>
      <dc:creator>elgeo</dc:creator>
      <dc:date>2022-10-31T12:46:17Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93657#M38742</link>
      <description>&lt;P&gt;Hi, have this been fixed later on? We have seen similar issues. Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2024 21:47:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93657#M38742</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-10-11T21:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93698#M38746</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/100643"&gt;@MikeGo&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/69823"&gt;@elgeo&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;1. Regarding VACUUM it does not remove log files&amp;nbsp;&lt;A href="https://docs.delta.io/latest/delta-utility.html" target="_self"&gt;per documentation&lt;/A&gt;:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_0-1728815753680.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11865i906A37D2E6F1CEED/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_0-1728815753680.png" alt="filipniziol_0-1728815753680.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;2. Setting 1 minute as delta.logRetentionDuration is way to low and may not work.&lt;/P&gt;&lt;P&gt;The default is 30 days and there is safety check that prevents setting it below 7 days. More on this in &lt;A href="https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/td-p/20368" target="_self"&gt;this topic&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Oct 2024 10:42:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93698#M38746</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-10-13T10:42:45Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93736#M38751</link>
      <description>&lt;P&gt;That's right, just a small notice: the&amp;nbsp;&lt;SPAN&gt;default threshold for retention period is 7 days.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Oct 2024 18:56:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93736#M38751</guid>
      <dc:creator>radothede</dc:creator>
      <dc:date>2024-10-13T18:56:07Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93746#M38757</link>
      <description>&lt;P&gt;Awesome, thanks for response.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Oct 2024 23:39:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93746#M38757</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-10-13T23:39:17Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93766#M38760</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/104480"&gt;@radothede&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The default for&amp;nbsp;delta.&lt;SPAN&gt;logRetentionDuration&lt;/SPAN&gt; is 30 days as &lt;A href="https://docs.databricks.com/en/delta/history.html" target="_self"&gt;per documentation&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_1-1728885273433.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11879iD806129DFA83A73A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_1-1728885273433.png" alt="filipniziol_1-1728885273433.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2024 05:55:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93766#M38760</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-10-14T05:55:56Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93770#M38763</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117376"&gt;@filipniziol&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are both right, but to be specific, I was referring VACUUM command - so effectively, if You run it on a table,&amp;nbsp;by default, the VACUUM command will delete data files older than 7 days from the storage that are no longer referenced by the delta table's transaction log.&lt;/P&gt;&lt;P&gt;So, to make it clear:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;delta.deletedFileRetentionDuration - default 7 days&lt;/STRONG&gt;, deletes data older than specified retention period - triggered by VACUUM command;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;delta.logRetentionDuration - default 30 days&lt;/STRONG&gt;, removes logs older than retention period while overwriting the checkpoint file - build-in mechanism, does not need VACUUM;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2024 06:51:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/93770#M38763</guid>
      <dc:creator>radothede</dc:creator>
      <dc:date>2024-10-14T06:51:49Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/130579#M48845</link>
      <description>&lt;P&gt;What you’re seeing is expected behavior — the _delta_log folder always keeps a history of JSON commit files, checkpoint files, and CRCs. Even if you lower delta.logRetentionDuration and run VACUUM, cleanup won’t happen immediately. A couple of points to note:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;The property delta.logRetentionDuration controls how long log history is kept for &lt;STRONG&gt;time travel&lt;/STRONG&gt;, but actual cleanup only happens when a new checkpoint is written &lt;EM&gt;and&lt;/EM&gt; retention thresholds are met.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Setting it to something like 1 minute will disable time travel almost immediately, but you still need to wait for the next compaction/checkpoint cycle to actually drop files.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;VACUUM only removes data files, not log files — so it won’t reduce _delta_log size on its own.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you really don’t need any history/time travel, the supported approach is to:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Set spark.databricks.delta.retentionDurationCheck.enabled = false.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Use a very small delta.logRetentionDuration (like interval 1 minute).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Trigger a few commits (inserts/updates) so new checkpoints are written.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;A href="https://delta-executors.co/" target="_self"&gt;Delta&lt;/A&gt; will then automatically prune older JSON and CRC files beyond the retention window.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Also note that the _delta_log folder will never be completely empty — at least the most recent checkpoint plus a few commit files are always retained.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Sep 2025 05:19:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/130579#M48845</guid>
      <dc:creator>michaeljac1986</dc:creator>
      <dc:date>2025-09-03T05:19:26Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/140953#M51589</link>
      <description>&lt;P&gt;I have use Delta Executor for few days ago amazing tool. I get it from &lt;A href="https://delta-executor.my/" target="_self"&gt;delta-executor.my&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 03:19:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/140953#M51589</guid>
      <dc:creator>jasonrich12</dc:creator>
      <dc:date>2025-12-03T03:19:02Z</dc:date>
    </item>
    <item>
      <title>Re: Clean up _delta_log files</title>
      <link>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/141055#M51615</link>
      <description>&lt;P&gt;Delta Lake does automatically clean up _delta_log files (JSON, CHECKPOINT, CRC), but only when two conditions are met:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;The retention durations are respected&lt;BR /&gt;By default:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;delta.logRetentionDuration = 30 days&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;delta.deletedFileRetentionDuration = 7 days&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;spark.databricks.delta.retentionDurationCheck.enabled = true (safety check)&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;A new checkpoint is created after the retention window has passed&lt;BR /&gt;Cleanup only happens when a new checkpoint is written, not immediately when properties are changed.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;HR /&gt;
&lt;H3&gt;&lt;span class="lia-unicode-emoji" title=":heavy_check_mark:"&gt;✔️&lt;/span&gt; Why files are not being deleted in your case&lt;/H3&gt;
&lt;P&gt;Even though you set:&lt;/P&gt;
&lt;PRE&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false;

ALTER TABLE table_name
SET TBLPROPERTIES (
  'delta.logRetentionDuration'='interval 1 minutes',
  'delta.deletedFileRetentionDuration'='interval 1 minutes'
);

VACUUM table_name RETAIN 0 HOURS;
&lt;/PRE&gt;
&lt;P&gt;Delta still won't delete older log files unless:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;The retention interval has actually passed in wall-clock time&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;A new checkpoint is written after the retention window&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;The table has enough new commits to trigger a checkpoint (usually every 10 commits)&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Simply setting the retention to 1 minute does not retroactively delete anything. Delta only evaluates retention at checkpoint-creation time.&lt;/P&gt;
&lt;H4&gt;1. VACUUM does not delete JSON / CHECKPOINT log files&lt;/H4&gt;
&lt;P&gt;VACUUM only removes data files that are no longer referenced.&lt;BR /&gt;It never touches the transaction log.&lt;/P&gt;
&lt;P&gt;This is why your _delta_log folder still looks large.&lt;/P&gt;
&lt;H4&gt;2. _delta_log cleanup only happens during checkpoint creation&lt;/H4&gt;
&lt;P&gt;If you are not generating new transactions, no cleanup will happen.&lt;/P&gt;
&lt;H4&gt;3. Very low retention settings (like 1 minute) are not recommended&lt;/H4&gt;
&lt;P&gt;They can cause checkpoint conflicts and metadata corruption during concurrent writes.&lt;/P&gt;
&lt;H3&gt;You can force a cleanup safely&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Make sure retention check is disabled within the cluster that writes the table&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false;
&lt;/PRE&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;
&lt;P&gt;Set realistic, low—but safe—retention, e.g.:&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;ALTER TABLE table_name
SET TBLPROPERTIES (
  'delta.logRetentionDuration'='interval 1 day',
  'delta.deletedFileRetentionDuration'='interval 1 day'
);
&lt;/PRE&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;
&lt;P&gt;Generate a few commits to trigger a new checkpoint:&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;df = spark.table("table_name").limit(1)
df.write.mode("append").format("delta").saveAsTable("table_name")
&lt;/PRE&gt;
&lt;P&gt;Repeat 10 times to force a checkpoint.&lt;/P&gt;
&lt;OL start="4"&gt;
&lt;LI&gt;
&lt;P&gt;After the new checkpoint, older log files (beyond retention) will be removed automatically.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;HR /&gt;
&lt;H3&gt;In Summary:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;_delta_log files are not deleted by VACUUM&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;They are only deleted during checkpoint creation&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Changing retention properties does not delete old logs immediately&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;You must generate commits and allow a checkpoint to be created&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Only then will Delta remove logs older than the retention window&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 03 Dec 2025 18:41:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clean-up-delta-log-files/m-p/141055#M51615</guid>
      <dc:creator>iyashk-DB</dc:creator>
      <dc:date>2025-12-03T18:41:46Z</dc:date>
    </item>
  </channel>
</rss>

