<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Myths about vacuum command in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/97165#M39445</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;&amp;amp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;: I found this command "vacuum table1 retain 7 days" in many youtube and educational contents. This is very misleading. I found another solution to avoid this.&lt;/P&gt;&lt;P&gt;set delta.databricks.delta.retentionDurationCheck.enabled = false. It works if I want to delete obsolete files whose lifespan is less than default retention duration.&lt;/P&gt;</description>
    <pubDate>Fri, 01 Nov 2024 06:41:31 GMT</pubDate>
    <dc:creator>Sangram</dc:creator>
    <dc:date>2024-11-01T06:41:31Z</dc:date>
    <item>
      <title>Myths about vacuum command</title>
      <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96737#M39330</link>
      <description>&lt;P&gt;I identified some myths while working with vacuum command spark 3.5.x.&lt;/P&gt;&lt;P&gt;1. vacuum command is not working with days. Instead it's retain clause is asking explicitly to supply values in hours. I tried many times, and it is throwing parse syntax error (why ???).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sangram11_0-1730255825227.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12373iC2F6EA35D3FDDAA5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sangram11_0-1730255825227.png" alt="sangram11_0-1730255825227.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;2. You cannot execute vacuum command if delta.enableChangeDataFeed is enabled. Because it cannot remove files from _change_data folder if it contains parquet files in it.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sangram11_1-1730256066071.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12374i702346078A031D6F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sangram11_1-1730256066071.png" alt="sangram11_1-1730256066071.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So, your table history is not deleted by vacuum command is CDF is enabled.&lt;/P&gt;&lt;P&gt;Let me know if you want to pass me some knowledge on vacuum command. Because I feel it is not doing its work as expected.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Oct 2024 02:49:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96737#M39330</guid>
      <dc:creator>sangram11</dc:creator>
      <dc:date>2024-10-30T02:49:33Z</dc:date>
    </item>
    <item>
      <title>Re: Myths about vacuum command</title>
      <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96970#M39377</link>
      <description>&lt;P&gt;&lt;STRONG&gt;It is due to the retention Period for Change Data Feed&lt;/STRONG&gt;: When CDF is enabled, Databricks retains the change data for a specified period. This retention period ensures that the change data is available for downstream processing and auditing. The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;command respects this retention period and does not delete files that are still within the retention window.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Verify the retention period for Change Data Feed and ensure that the&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;command's retention &lt;U&gt;&lt;STRONG&gt;period is greater than or equal&lt;/STRONG&gt;&lt;/U&gt; to the CDF retention period.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;-- Check the current retention period for Change Data Feed&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;DESCRIBE HISTORY my_delta_table;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;-- Adjust the retention period for the VACUUM command&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;VACUUM my_delta_table RETAIN 168&lt;/SPAN&gt; &lt;SPAN&gt;HOURS&lt;/SPAN&gt;&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;if you want to change the default retention period of change data feed then do this :&lt;BR /&gt;ALTER TABLE my_delta_table&lt;BR /&gt;SET TBLPROPERTIES ('delta.changeDataFeed.retentionDuration' = '30 days');&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 11:09:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96970#M39377</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2024-10-31T11:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Myths about vacuum command</title>
      <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96979#M39380</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;
&lt;P&gt;1. vacuum command is not working with days. Instead it's retain clause is asking explicitly to supply values in hours. I tried many times, and it is throwing parse syntax error (why ???).&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Can you please point us out to where it is mentioned that vacuum accepts "days" as a parameter? We may need to have that specific document updated. This is what I was able to find:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html#parameters" target="_blank"&gt;https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html#parameters&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;So, your table history is not deleted by vacuum command is CDF is enabled.&lt;/BLOCKQUOTE&gt;
&lt;P&gt;To summarize Saurabh's comment, the VACUUM command can still run on a table with CDF enabled, but it will respect the CDF retention period. Files that are within the CDF retention period will not be deleted by VACUUM, ensuring that change data remains available for processing. To avoid conflicts, verify that the VACUUM retention period is greater than or equal to the CDF retention period. Adjust the CDF retention period if necessary using the delta.changeDataFeed.retentionDuration property.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 12:28:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/96979#M39380</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-10-31T12:28:08Z</dc:date>
    </item>
    <item>
      <title>Re: Myths about vacuum command</title>
      <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/97165#M39445</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;&amp;amp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;: I found this command "vacuum table1 retain 7 days" in many youtube and educational contents. This is very misleading. I found another solution to avoid this.&lt;/P&gt;&lt;P&gt;set delta.databricks.delta.retentionDurationCheck.enabled = false. It works if I want to delete obsolete files whose lifespan is less than default retention duration.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2024 06:41:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/97165#M39445</guid>
      <dc:creator>Sangram</dc:creator>
      <dc:date>2024-11-01T06:41:31Z</dc:date>
    </item>
    <item>
      <title>Re: Myths about vacuum command</title>
      <link>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/97175#M39451</link>
      <description>&lt;P&gt;Thanks for reporting this Sangram. Are these youtube and educational contents in the Databricks channel?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;gt;&amp;nbsp;set delta.databricks.delta.retentionDurationCheck.enabled = false. It works if I want to delete obsolete files whose lifespan is less than default retention duration.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;That's fine as long as you know this also introduces a risk: any files essential for tracking data changes, maintaining historical versions, or supporting CDF operations could be deleted prematurely. This could result in unintentional data loss, such as loss of previous data states or inability to access certain changes, impacting versioning and downstream processes.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2024 08:25:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/myths-about-vacuum-command/m-p/97175#M39451</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-11-01T08:25:11Z</dc:date>
    </item>
  </channel>
</rss>

