<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimize and Vaccum Command in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/120271#M46111</link>
    <description>&lt;P&gt;&lt;SPAN&gt;That's a valid point about minimal read queries! However, while immediate storage reduction might not be necessary, consistent data integrity and potential future reporting needs might still warrant occasional optimize and vacuuming, even with external tables. What's your perspective on data lifecycle management here?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 27 May 2025 01:58:03 GMT</pubDate>
    <dc:creator>JaimeAnders</dc:creator>
    <dc:date>2025-05-27T01:58:03Z</dc:date>
    <item>
      <title>Optimize and Vaccum Command</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/59239#M31346</link>
      <description>&lt;P&gt;Hi team,&lt;/P&gt;&lt;P&gt;I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items&lt;/P&gt;&lt;P&gt;1.Should I need to&amp;nbsp; run Optimize and Vacuum command ? . Very Minimal Read Queries are executed against the audit tables&lt;/P&gt;&lt;P&gt;2. If i need to run, should I add Optimize and vacuum command in the same notebook to shrink the storage layer?&lt;/P&gt;&lt;P&gt;3. What scenarios should i look for to optimize and vaccum command for tables involved in purge process&lt;/P&gt;&lt;P&gt;3.No Action. Will data bricks and Apache Spark framework takes care internally on optimizing ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Feb 2024 14:11:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/59239#M31346</guid>
      <dc:creator>Ramakrishnan83</dc:creator>
      <dc:date>2024-02-04T14:11:23Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize and Vaccum Command</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/59319#M31368</link>
      <description>&lt;P&gt;Hi &lt;SPAN class=""&gt;&lt;A class="" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96694" target="_self"&gt;&lt;SPAN class=""&gt;Ramakrishnan83,&lt;/SPAN&gt;&lt;/A&gt;&lt;BR /&gt;1. Vacume commands only work with delta tables, Vacume command will delete the parquet files older than the retention period which is by default 7 days.&amp;nbsp; Optimize will rather club the files in case any special serial is provided.&lt;BR /&gt;2. Ideally, as per the databricks recommendation if there is continuous data writing, then the optimize command should be executed daily.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;3. Both the commands optimize and vacuum will optimize in different ways:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Optimize will collocate the data based on patterns in the dataset.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN class=""&gt;Vacuum will delete the paruqet files from the storage layer.&lt;BR /&gt;Please refer to the articles for more details.&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;A href="https://docs.databricks.com/en/delta/optimize.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/optimize.html&lt;/A&gt; &lt;A href="https://docs.databricks.com/en/sql/language-manual/delta-optimize.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/sql/language-manual/delta-optimize.html&lt;/A&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Feb 2024 18:21:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/59319#M31368</guid>
      <dc:creator>Hkesharwani</dc:creator>
      <dc:date>2024-02-05T18:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize and Vaccum Command</title>
      <link>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/120271#M46111</link>
      <description>&lt;P&gt;&lt;SPAN&gt;That's a valid point about minimal read queries! However, while immediate storage reduction might not be necessary, consistent data integrity and potential future reporting needs might still warrant occasional optimize and vacuuming, even with external tables. What's your perspective on data lifecycle management here?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 May 2025 01:58:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimize-and-vaccum-command/m-p/120271#M46111</guid>
      <dc:creator>JaimeAnders</dc:creator>
      <dc:date>2025-05-27T01:58:03Z</dc:date>
    </item>
  </channel>
</rss>

