<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: VACUUM with Azure Storage Inventory Report is not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/116143#M45253</link>
    <description>&lt;P&gt;After additional investigation it turned out the proper "fully-qualified-URL" path should be&lt;/P&gt;&lt;P&gt;'dbfs:/mnt/...'&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;'dbfs:/mnt/{endpoint}/' || ir.Name as path,&lt;/LI-CODE&gt;&lt;P&gt;and not&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;'https://xxx.blob.core.windows.net/' || ir.Name as path,&lt;/LI-CODE&gt;</description>
    <pubDate>Tue, 22 Apr 2025 08:17:48 GMT</pubDate>
    <dc:creator>YuriS</dc:creator>
    <dc:date>2025-04-22T08:17:48Z</dc:date>
    <item>
      <title>VACUUM with Azure Storage Inventory Report is not working</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/115827#M45194</link>
      <description>&lt;P&gt;Could someone please advise regarding VACUUM with Azure Storage Inventory Report as i have failed to make it work.&lt;BR /&gt;&lt;BR /&gt;DBR 15.4 LTS, VACUUM command is being run with&amp;nbsp;&lt;SPAN&gt;USING INVENTORY clause, as follows:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;VACUUM schema.table USING INVENTORY (
select 'https://xxx.blob.core.windows.net/' || ir.Name as path,
          ir.`Content-Length` as length,
          case when ir.hdi_isfolder is null then false else ir.hdi_isfolder end as isDir,
          ir.`Last-Modified`  as modificationTime
    from inventory_raw ir
   where ...
)&lt;/LI-CODE&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;it does not fail, however it does not VACUUM anything.&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Describe history output is as follows:&lt;BR /&gt;&lt;BR /&gt;VACUUM END {"numDeletedFiles":"0","numVacuumedDirectories":"1"}&lt;BR /&gt;VACUUM START {"numFilesToDelete":"0","sizeOfDataToDelete":"0"}&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;At the same time VACUUM &lt;U&gt;without&lt;/U&gt; INVENTORY clause, but with DRY RUN option shows 1k files to be vacuumed.&lt;/DIV&gt;&lt;DIV&gt;Can someone also advise if that USING INVENTORY clause really works on Databricks' version of Delta - i failed to find any information in official Databricks docs, only here:&amp;nbsp;&lt;A href="https://delta.io/blog/efficient-delta-vacuum/" target="_blank"&gt;https://delta.io/blog/efficient-delta-vacuum/&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Thank you&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Apr 2025 08:45:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/115827#M45194</guid>
      <dc:creator>YuriS</dc:creator>
      <dc:date>2025-04-18T08:45:04Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM with Azure Storage Inventory Report is not working</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/115904#M45210</link>
      <description>&lt;P&gt;Hi&amp;nbsp;YuriS,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding,&amp;nbsp;you're absolutely right to look into the USING INVENTORY clause for VACUUM, especially when dealing with large storage footprints. The tricky part is that while this feature is part of open-source Delta Lake, it's not yet fully supported or documented in Databricks' managed Delta implementation—which explains why you're seeing unexpected results and not finding official documentation in the Databricks docs.&lt;/P&gt;&lt;P&gt;In your case, the VACUUM command runs but doesn't delete anything because Databricks isn't actually wired to act on external inventory metadata yet, even though it parses the syntax without error. That’s why your dry-run vacuum (without inventory) shows 1K files ready to be cleaned, but the inventory-based vacuum does nothing—it's not using the external inventory report in a meaningful way within Databricks at this time.&lt;/P&gt;&lt;P&gt;So for now, I'd suggest sticking with the standard VACUUM approach in Databricks, possibly using DRY RUN regularly to monitor what would be removed. You could also automate this with a custom retention window to stay efficient. Hopefully, Databricks adds support for inventory-based vacuuming soon, especially since it’s great for large cloud storage environments—but as of now, it’s not officially supported on the managed platform. Let me know if you’d like help setting up a more efficient vacuum strategy based on what Databricks does support today!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Sat, 19 Apr 2025 03:24:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/115904#M45210</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-04-19T03:24:04Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM with Azure Storage Inventory Report is not working</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/116143#M45253</link>
      <description>&lt;P&gt;After additional investigation it turned out the proper "fully-qualified-URL" path should be&lt;/P&gt;&lt;P&gt;'dbfs:/mnt/...'&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;'dbfs:/mnt/{endpoint}/' || ir.Name as path,&lt;/LI-CODE&gt;&lt;P&gt;and not&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;'https://xxx.blob.core.windows.net/' || ir.Name as path,&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 22 Apr 2025 08:17:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-with-azure-storage-inventory-report-is-not-working/m-p/116143#M45253</guid>
      <dc:creator>YuriS</dc:creator>
      <dc:date>2025-04-22T08:17:48Z</dc:date>
    </item>
  </channel>
</rss>

