<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: VACUUM vs VACUUM LITE in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140363#M51402</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot for explaining it clearly!!&lt;/P&gt;&lt;P&gt;Also thanks a lot for owing it and creating a GitHub issue and also providing a workaround to classify the VACUUM type.&lt;/P&gt;&lt;P&gt;I hope we get flag seen in the delta history soon!!&lt;/P&gt;</description>
    <pubDate>Wed, 26 Nov 2025 05:04:37 GMT</pubDate>
    <dc:creator>analyticsnerd</dc:creator>
    <dc:date>2025-11-26T05:04:37Z</dc:date>
    <item>
      <title>VACUUM vs VACUUM LITE</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140359#M51399</link>
      <description>&lt;P&gt;Hey Team,&lt;/P&gt;&lt;P&gt;I have a few questions regarding VACUUM and VACUUM LITE&lt;/P&gt;&lt;P&gt;1. How do they work internally, do both of them scan the entire table storage directory?&lt;/P&gt;&lt;P&gt;2. How should we use these in our prod jobs..I mean should we always run VACUUM LITE or VACUUM or a combination?&lt;/P&gt;&lt;P&gt;3. How can we differentiate which operation( whether FULL or LITE )was performed by on the table when VACUUM is run? I tried running it and don't see anything in the delta history about it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Nov 2025 02:48:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140359#M51399</guid>
      <dc:creator>analyticsnerd</dc:creator>
      <dc:date>2025-11-26T02:48:50Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM vs VACUUM LITE</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140360#M51400</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198892"&gt;@analyticsnerd&lt;/a&gt;&amp;nbsp;!&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;1. How do they work internally? Do both of them scan the entire table storage directory?&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;VACUUM OR (VACCUM FULL) does a full table directory listing to identify files to be deleted.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;VACUUM LITE relies on the remove-file action from the delta log to identify files for deletion, so it doesn't have to scan the entire directory to identify files to be deleted. Also, this is incremental as it stores the version which it last read/referenced for vacuum inside&amp;nbsp;&lt;CODE&gt;_last_vacuum_info&lt;/CODE&gt;&amp;nbsp;the directory within the delta log. Available from 16.1&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;2. How should we use these in our prod jobs...I mean, should we always run VACUUM LITE, VACUUM, or a combination?&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Below are a few best practices:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Run VACUUM daily/regularly on the tables to minimise the retention of unused files.&lt;/LI&gt;
&lt;LI&gt;Avoid writing very small and too many files, which will cause high listing times whenever VACCUM is executed&lt;/LI&gt;
&lt;LI&gt;Do not run VACUUM with RETENTION 0 hours&lt;/LI&gt;
&lt;LI&gt;Also, these best practices as well before running a VACUUM. Link:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/en/delta/vacuum.html#what-size-cluster-does-vacuum-need" target="_blank" rel="noopener noreferrer"&gt;https://docs.databricks.com/en/delta/vacuum.html#what-size-cluster-does-vacuum-need&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Run VACUUM LITE daily/weekly and VACUUM once in a while(a month or quarter) as a best pactice&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;3. How can we differentiate which operation( whether FULL or LITE )was performed by on the table when VACUUM is run? I tried running it, and don't see anything in the delta history about it.&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This is a great question, and I agree it's a pain point.I will work on this to include the type of VACUUM in the delta log.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;But until then, there is a hack way to know which VACUUM has been run ,&amp;nbsp; by lokking at the Driver logs. Below are the example logs:&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;25/08/15 04:35:42 INFO VacuumCommand: Starting garbage collection (dryRun = true) of untracked files older than 15 Aug 2025 02:35:42 GMT in dbfs:/user/hive/warehouse/ak_db.db/delta_16_4_optimize

25/07/31 10:48:07 INFO VacuumCommand: Found 1076 files (731672207 bytes) and directories in a total of 1 directories that are safe to delete. Vacuum stats: DeltaVacuumStats(true,None,604800000,1753354083869,1,0,0,0,3795,0,1753958883805,1753958887875,4,4,4,false,0,0,1078,None,None,LITE)

25/07/31 10:59:46 INFO VacuumCommand: Deleted 1076 files (731672207 bytes) and directories in a total of 1 directories. Vacuum stats: DeltaVacuumStats(false,Some(0),604800000,1753959524951,1,1093,1076,731672207,51387,3939,1753959524900,1753959585658,4,4,4,false,0,0,1086,Some(0),Some(1086),LITE)&lt;/LI-CODE&gt;
&lt;P&gt;From the above, you can see the work LITE in the VACUUM stats, which tells it's a VACUUM lite we ran. But i still agree that we need to have it in the Delta history, which I will work on and get back soon with an update .&lt;/P&gt;</description>
      <pubDate>Wed, 26 Nov 2025 03:11:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140360#M51400</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-11-26T03:11:10Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM vs VACUUM LITE</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140362#M51401</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198892"&gt;@analyticsnerd&lt;/a&gt;&amp;nbsp;, I have created a GitHub issue, and I will work on it later&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Issue&lt;/STRONG&gt;:&amp;nbsp;&lt;A href="https://github.com/delta-io/delta/issues/5586" target="_blank"&gt;https://github.com/delta-io/delta/issues/5586&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Nov 2025 03:39:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140362#M51401</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-11-26T03:39:58Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM vs VACUUM LITE</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140363#M51402</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot for explaining it clearly!!&lt;/P&gt;&lt;P&gt;Also thanks a lot for owing it and creating a GitHub issue and also providing a workaround to classify the VACUUM type.&lt;/P&gt;&lt;P&gt;I hope we get flag seen in the delta history soon!!&lt;/P&gt;</description>
      <pubDate>Wed, 26 Nov 2025 05:04:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140363#M51402</guid>
      <dc:creator>analyticsnerd</dc:creator>
      <dc:date>2025-11-26T05:04:37Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM vs VACUUM LITE</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140381#M51406</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60098"&gt;@K_Anudeep&lt;/a&gt;&amp;nbsp;for the insights on VACUUM operations.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Nov 2025 08:37:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-vs-vacuum-lite/m-p/140381#M51406</guid>
      <dc:creator>Raman_Unifeye</dc:creator>
      <dc:date>2025-11-26T08:37:59Z</dc:date>
    </item>
  </channel>
</rss>

