Hi YuriS,
How are you doing today?, As per my understanding, you're absolutely right to look into the USING INVENTORY clause for VACUUM, especially when dealing with large storage footprints. The tricky part is that while this feature is part of open-source Delta Lake, it's not yet fully supported or documented in Databricks' managed Delta implementationโwhich explains why you're seeing unexpected results and not finding official documentation in the Databricks docs.
In your case, the VACUUM command runs but doesn't delete anything because Databricks isn't actually wired to act on external inventory metadata yet, even though it parses the syntax without error. Thatโs why your dry-run vacuum (without inventory) shows 1K files ready to be cleaned, but the inventory-based vacuum does nothingโit's not using the external inventory report in a meaningful way within Databricks at this time.
So for now, I'd suggest sticking with the standard VACUUM approach in Databricks, possibly using DRY RUN regularly to monitor what would be removed. You could also automate this with a custom retention window to stay efficient. Hopefully, Databricks adds support for inventory-based vacuuming soon, especially since itโs great for large cloud storage environmentsโbut as of now, itโs not officially supported on the managed platform. Let me know if youโd like help setting up a more efficient vacuum strategy based on what Databricks does support today!
Regards,
Brahma