<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138294#M50905</link>
    <description>&lt;P&gt;Great question&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/197109"&gt;@eyalholzmann&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Databricks Delta Lake with the Iceberg Uniform feature, &lt;STRONG&gt;VACUUM operations on the Delta table do NOT automatically clean up the corresponding Iceberg metadata&lt;/STRONG&gt;. The two metadata layers are managed separately, and understanding this distinction is critical to avoid potential data corruption and query failures.&lt;/P&gt;
&lt;H2&gt;How Metadata Cleanup Works&lt;/H2&gt;
&lt;H3&gt;Delta Lake VACUUM Behavior&lt;/H3&gt;
&lt;P&gt;When you run VACUUM on a Delta table with Iceberg Uniform enabled, the operation removes Parquet data files that are no longer referenced by Delta Lake metadata based on the retention period you specify. This standard Delta Lake cleanup process only considers the Delta transaction log when determining which files to remove.&lt;/P&gt;
&lt;H3&gt;Iceberg Metadata Management&lt;/H3&gt;
&lt;P&gt;The Iceberg metadata generated by UniForm is stored separately in the table directory under the `/metadata/` subdirectory as versioned JSON files following the pattern `&amp;lt;table-path&amp;gt;/metadata/&amp;lt;version-number&amp;gt;-&amp;lt;uuid&amp;gt;.metadata.json`. These metadata files track their own snapshots and manifest files independently from Delta's transaction log.&lt;/P&gt;
&lt;H3&gt;Critical Risk: Metadata Synchronization&lt;/H3&gt;
&lt;P&gt;A significant operational concern exists when using path-based Iceberg clients: &lt;STRONG&gt;users may encounter errors when querying Iceberg tables using out-of-date metadata versions after VACUUM removes Parquet data files from the Delta table&lt;/STRONG&gt;. This happens because:&lt;/P&gt;
&lt;P&gt;- The Iceberg metadata files may still reference data files that VACUUM has removed&lt;BR /&gt;- Path-based Iceberg clients require manual updating and refreshing of metadata JSON paths to read current table versions&lt;BR /&gt;- There's no automatic cleanup mechanism that removes stale Iceberg metadata when corresponding data files are vacuumed&lt;/P&gt;
&lt;H2&gt;Recommended Approach&lt;/H2&gt;
&lt;P&gt;To manage this setup effectively:&lt;/P&gt;
&lt;P&gt;1. &lt;STRONG&gt;Enable Predictive Optimization&lt;/STRONG&gt;: Databricks recommends enabling predictive optimization for Unity Catalog managed tables, which automatically handles VACUUM operations and maintenance tasks&lt;/P&gt;
&lt;P&gt;2. &lt;STRONG&gt;Monitor Metadata Status&lt;/STRONG&gt;: Use `DESCRIBE EXTENDED table_name` to check the `converted_delta_version` and `converted_delta_timestamp` fields to verify which Delta version corresponds to the current Iceberg metadata&lt;/P&gt;
&lt;P&gt;3. &lt;STRONG&gt;Manual Metadata Refresh&lt;/STRONG&gt;: If metadata becomes stale, use `MSCK REPAIR TABLE &amp;lt;table-name&amp;gt; SYNC METADATA` to manually trigger Iceberg metadata regeneration&lt;/P&gt;
&lt;P&gt;4. &lt;STRONG&gt;Coordinate Retention Periods&lt;/STRONG&gt;: Ensure your VACUUM retention period is long enough to account for any lag in Iceberg metadata updates and client access patterns&lt;/P&gt;
&lt;P&gt;The key takeaway is that Iceberg metadata cleanup is &lt;STRONG&gt;not automatic&lt;/STRONG&gt;&amp;nbsp;when running VACUUM, and you must carefully manage metadata synchronization to prevent Iceberg clients from attempting to read files that have been removed by Delta's cleanup processes.&lt;/P&gt;
&lt;P&gt;Hope this helps, Louis.&lt;/P&gt;</description>
    <pubDate>Sun, 09 Nov 2025 15:24:27 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2025-11-09T15:24:27Z</dc:date>
    <item>
      <title>Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138253#M50888</link>
      <description>&lt;P&gt;I'm working with &lt;STRONG&gt;Delta tables&lt;/STRONG&gt; using the &lt;STRONG&gt;Iceberg Uniform feature&lt;/STRONG&gt; to enable Iceberg-compatible reads. I’m trying to understand how metadata cleanup works in this setup.&lt;/P&gt;&lt;P&gt;Specifically, does the &lt;STRONG&gt;VACUUM operation&lt;/STRONG&gt;—which removes old Delta Lake metadata based on the retention period—also trigger deletion of the corresponding &lt;STRONG&gt;Iceberg metadata&lt;/STRONG&gt;? Or is Iceberg metadata managed separately and requires its own cleanup process?&lt;/P&gt;</description>
      <pubDate>Sun, 09 Nov 2025 09:43:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138253#M50888</guid>
      <dc:creator>eyalholzmann</dc:creator>
      <dc:date>2025-11-09T09:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138294#M50905</link>
      <description>&lt;P&gt;Great question&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/197109"&gt;@eyalholzmann&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Databricks Delta Lake with the Iceberg Uniform feature, &lt;STRONG&gt;VACUUM operations on the Delta table do NOT automatically clean up the corresponding Iceberg metadata&lt;/STRONG&gt;. The two metadata layers are managed separately, and understanding this distinction is critical to avoid potential data corruption and query failures.&lt;/P&gt;
&lt;H2&gt;How Metadata Cleanup Works&lt;/H2&gt;
&lt;H3&gt;Delta Lake VACUUM Behavior&lt;/H3&gt;
&lt;P&gt;When you run VACUUM on a Delta table with Iceberg Uniform enabled, the operation removes Parquet data files that are no longer referenced by Delta Lake metadata based on the retention period you specify. This standard Delta Lake cleanup process only considers the Delta transaction log when determining which files to remove.&lt;/P&gt;
&lt;H3&gt;Iceberg Metadata Management&lt;/H3&gt;
&lt;P&gt;The Iceberg metadata generated by UniForm is stored separately in the table directory under the `/metadata/` subdirectory as versioned JSON files following the pattern `&amp;lt;table-path&amp;gt;/metadata/&amp;lt;version-number&amp;gt;-&amp;lt;uuid&amp;gt;.metadata.json`. These metadata files track their own snapshots and manifest files independently from Delta's transaction log.&lt;/P&gt;
&lt;H3&gt;Critical Risk: Metadata Synchronization&lt;/H3&gt;
&lt;P&gt;A significant operational concern exists when using path-based Iceberg clients: &lt;STRONG&gt;users may encounter errors when querying Iceberg tables using out-of-date metadata versions after VACUUM removes Parquet data files from the Delta table&lt;/STRONG&gt;. This happens because:&lt;/P&gt;
&lt;P&gt;- The Iceberg metadata files may still reference data files that VACUUM has removed&lt;BR /&gt;- Path-based Iceberg clients require manual updating and refreshing of metadata JSON paths to read current table versions&lt;BR /&gt;- There's no automatic cleanup mechanism that removes stale Iceberg metadata when corresponding data files are vacuumed&lt;/P&gt;
&lt;H2&gt;Recommended Approach&lt;/H2&gt;
&lt;P&gt;To manage this setup effectively:&lt;/P&gt;
&lt;P&gt;1. &lt;STRONG&gt;Enable Predictive Optimization&lt;/STRONG&gt;: Databricks recommends enabling predictive optimization for Unity Catalog managed tables, which automatically handles VACUUM operations and maintenance tasks&lt;/P&gt;
&lt;P&gt;2. &lt;STRONG&gt;Monitor Metadata Status&lt;/STRONG&gt;: Use `DESCRIBE EXTENDED table_name` to check the `converted_delta_version` and `converted_delta_timestamp` fields to verify which Delta version corresponds to the current Iceberg metadata&lt;/P&gt;
&lt;P&gt;3. &lt;STRONG&gt;Manual Metadata Refresh&lt;/STRONG&gt;: If metadata becomes stale, use `MSCK REPAIR TABLE &amp;lt;table-name&amp;gt; SYNC METADATA` to manually trigger Iceberg metadata regeneration&lt;/P&gt;
&lt;P&gt;4. &lt;STRONG&gt;Coordinate Retention Periods&lt;/STRONG&gt;: Ensure your VACUUM retention period is long enough to account for any lag in Iceberg metadata updates and client access patterns&lt;/P&gt;
&lt;P&gt;The key takeaway is that Iceberg metadata cleanup is &lt;STRONG&gt;not automatic&lt;/STRONG&gt;&amp;nbsp;when running VACUUM, and you must carefully manage metadata synchronization to prevent Iceberg clients from attempting to read files that have been removed by Delta's cleanup processes.&lt;/P&gt;
&lt;P&gt;Hope this helps, Louis.&lt;/P&gt;</description>
      <pubDate>Sun, 09 Nov 2025 15:24:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138294#M50905</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-09T15:24:27Z</dc:date>
    </item>
    <item>
      <title>Re: Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138540#M50952</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Which actions should be used to clean up and maintain Iceberg metadata?&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;expireSnapshots:&lt;/STRONG&gt; Is it recommended to delete old snapshots using the same retention period as the Delta table?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;deleteOrphanFiles:&lt;/STRONG&gt; This deletes unreferenced Iceberg metadata as well as unreferenced data files. Is it safe to run this when some data might still be referenced by Delta metadata?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;rewriteManifests:&lt;/STRONG&gt; This action rewrites manifest files for optimization but also creates a new snapshot. Should this be executed?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 11 Nov 2025 09:11:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138540#M50952</guid>
      <dc:creator>eyalholzmann</dc:creator>
      <dc:date>2025-11-11T09:11:50Z</dc:date>
    </item>
    <item>
      <title>Re: Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?</title>
      <link>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138939#M51056</link>
      <description>&lt;P class="qt3gz91 paragraph"&gt;Here’s how to approach cleaning and maintaining Apache Iceberg metadata on Databricks, and how it differs from Delta workflows.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;First, know your table type&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;For &lt;STRONG&gt;Unity Catalog–managed Iceberg tables&lt;/STRONG&gt;, Databricks runs table maintenance for you (predictive optimization) — including snapshot expiration and orphan-file cleanup — so you rarely need to run these actions manually.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;For &lt;STRONG&gt;foreign/external Iceberg tables&lt;/STRONG&gt; (or if you intentionally disable automation), you may choose to run specific Iceberg maintenance procedures yourself.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Action-by-action guidance&lt;/H3&gt;
&lt;H4 class="_7uu25p0 qt3gz9c _7pq7t612 heading4 _7uu25p1"&gt;expireSnapshots&lt;/H4&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Yes — &lt;STRONG&gt;expireSnapshots&lt;/STRONG&gt; is recommended to bound your time-travel/rollback window and keep metadata compact. On managed Iceberg, UC automates snapshot expiration; choose manual retention only when you need tighter control.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Don’t assume the &lt;STRONG&gt;same retention&lt;/STRONG&gt; as your Delta VACUUM. Set Iceberg’s retention to match your operational needs (time travel, audit requirements, longest-running jobs), independent of Delta’s retention checks. If you do run it manually, you can use Iceberg procedures, for example:&lt;BR /&gt;SQL (Iceberg proc)&lt;BR /&gt;CALL &amp;lt;catalog&amp;gt;.system.expire_snapshots(table =&amp;gt; 'db.tbl', older_than =&amp;gt; CURRENT_TIMESTAMP - INTERVAL 7 DAYS);&lt;/P&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cj wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
or (client-dependent syntax)&lt;BR /&gt;ALTER TABLE db.tbl EXECUTE expire_snapshots(retention_threshold =&amp;gt; '7d');&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 class="_7uu25p0 qt3gz9c _7pq7t612 heading4 _7uu25p1"&gt;deleteOrphanFiles&lt;/H4&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Only run &lt;STRONG&gt;deleteOrphanFiles&lt;/STRONG&gt; when the table’s storage location is used exclusively by Iceberg and you’re certain those files aren’t referenced elsewhere. If the same Parquet files serve multiple formats (e.g., Delta with Iceberg reads/UniForm), deleting “orphans” from Iceberg’s perspective can break Delta readers that still reference them. In short: &lt;STRONG&gt;not safe if Delta still references those files&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Why: Databricks supports workflows where a single copy of Parquet data is served to multiple formats; removing files because they’re “unreferenced” in Iceberg can invalidate concurrent readers in Delta or path-based Iceberg clients until metadata is refreshed.&lt;/P&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cj wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 class="_7uu25p0 qt3gz9c _7pq7t612 heading4 _7uu25p1"&gt;rewriteManifests&lt;/H4&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;&lt;STRONG&gt;rewriteManifests&lt;/STRONG&gt; is safe and often beneficial — it rewrites manifest files for planning efficiency and creates a new snapshot (data remains unchanged). On managed Iceberg, UC periodically optimizes metadata for you; consider manual rewrites for external tables or after heavy streaming/append workloads that produce many small manifests.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Practical tips (when you run it yourself): target specific large or fragmented manifests instead of rewriting all; avoid Spark executor memory pressure by disabling aggressive caching during the operation (client-dependent).&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Summary recommendations&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;On &lt;STRONG&gt;managed Iceberg&lt;/STRONG&gt;: rely on UC’s automated maintenance; override manually only for special cases or compliance windows.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;On &lt;STRONG&gt;external/foreign Iceberg&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;Use &lt;STRONG&gt;expireSnapshots&lt;/STRONG&gt; regularly (based on business SLAs),&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;Avoid &lt;STRONG&gt;deleteOrphanFiles&lt;/STRONG&gt; if any other table/format could still reference the same files (including Delta),&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;Run &lt;STRONG&gt;rewriteManifests&lt;/STRONG&gt; periodically to keep planning efficient, especially for streaming/high-churn tables.
&lt;DIV class="_7pq7t614 _7pq7t6cj wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="qt3gz91 paragraph"&gt;Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Nov 2025 14:35:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/does-vacuum-on-delta-lake-also-clean-iceberg-metadata-when-using/m-p/138939#M51056</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-13T14:35:32Z</dc:date>
    </item>
  </channel>
</rss>

