Sunday
I'm working with Delta tables using the Iceberg Uniform feature to enable Iceberg-compatible reads. Iโm trying to understand how metadata cleanup works in this setup.
Specifically, does the VACUUM operationโwhich removes old Delta Lake metadata based on the retention periodโalso trigger deletion of the corresponding Iceberg metadata? Or is Iceberg metadata managed separately and requires its own cleanup process?
Sunday
Great question @eyalholzmann ,
In Databricks Delta Lake with the Iceberg Uniform feature, VACUUM operations on the Delta table do NOT automatically clean up the corresponding Iceberg metadata. The two metadata layers are managed separately, and understanding this distinction is critical to avoid potential data corruption and query failures.
When you run VACUUM on a Delta table with Iceberg Uniform enabled, the operation removes Parquet data files that are no longer referenced by Delta Lake metadata based on the retention period you specify. This standard Delta Lake cleanup process only considers the Delta transaction log when determining which files to remove.
The Iceberg metadata generated by UniForm is stored separately in the table directory under the `/metadata/` subdirectory as versioned JSON files following the pattern `<table-path>/metadata/<version-number>-<uuid>.metadata.json`. These metadata files track their own snapshots and manifest files independently from Delta's transaction log.
A significant operational concern exists when using path-based Iceberg clients: users may encounter errors when querying Iceberg tables using out-of-date metadata versions after VACUUM removes Parquet data files from the Delta table. This happens because:
- The Iceberg metadata files may still reference data files that VACUUM has removed
- Path-based Iceberg clients require manual updating and refreshing of metadata JSON paths to read current table versions
- There's no automatic cleanup mechanism that removes stale Iceberg metadata when corresponding data files are vacuumed
To manage this setup effectively:
1. Enable Predictive Optimization: Databricks recommends enabling predictive optimization for Unity Catalog managed tables, which automatically handles VACUUM operations and maintenance tasks
2. Monitor Metadata Status: Use `DESCRIBE EXTENDED table_name` to check the `converted_delta_version` and `converted_delta_timestamp` fields to verify which Delta version corresponds to the current Iceberg metadata
3. Manual Metadata Refresh: If metadata becomes stale, use `MSCK REPAIR TABLE <table-name> SYNC METADATA` to manually trigger Iceberg metadata regeneration
4. Coordinate Retention Periods: Ensure your VACUUM retention period is long enough to account for any lag in Iceberg metadata updates and client access patterns
The key takeaway is that Iceberg metadata cleanup is not automatic when running VACUUM, and you must carefully manage metadata synchronization to prevent Iceberg clients from attempting to read files that have been removed by Delta's cleanup processes.
Hope this helps, Louis.
Tuesday
Which actions should be used to clean up and maintain Iceberg metadata?
expireSnapshots: Is it recommended to delete old snapshots using the same retention period as the Delta table?
deleteOrphanFiles: This deletes unreferenced Iceberg metadata as well as unreferenced data files. Is it safe to run this when some data might still be referenced by Delta metadata?
rewriteManifests: This action rewrites manifest files for optimization but also creates a new snapshot. Should this be executed?
15 hours ago
Hereโs how to approach cleaning and maintaining Apache Iceberg metadata on Databricks, and how it differs from Delta workflows.
For Unity Catalogโmanaged Iceberg tables, Databricks runs table maintenance for you (predictive optimization) โ including snapshot expiration and orphan-file cleanup โ so you rarely need to run these actions manually.
For foreign/external Iceberg tables (or if you intentionally disable automation), you may choose to run specific Iceberg maintenance procedures yourself.
Yes โ expireSnapshots is recommended to bound your time-travel/rollback window and keep metadata compact. On managed Iceberg, UC automates snapshot expiration; choose manual retention only when you need tighter control.
Donโt assume the same retention as your Delta VACUUM. Set Icebergโs retention to match your operational needs (time travel, audit requirements, longest-running jobs), independent of Deltaโs retention checks. If you do run it manually, you can use Iceberg procedures, for example:
SQL (Iceberg proc)
CALL <catalog>.system.expire_snapshots(table => 'db.tbl', older_than => CURRENT_TIMESTAMP - INTERVAL 7 DAYS);
Only run deleteOrphanFiles when the tableโs storage location is used exclusively by Iceberg and youโre certain those files arenโt referenced elsewhere. If the same Parquet files serve multiple formats (e.g., Delta with Iceberg reads/UniForm), deleting โorphansโ from Icebergโs perspective can break Delta readers that still reference them. In short: not safe if Delta still references those files.
Why: Databricks supports workflows where a single copy of Parquet data is served to multiple formats; removing files because theyโre โunreferencedโ in Iceberg can invalidate concurrent readers in Delta or path-based Iceberg clients until metadata is refreshed.
rewriteManifests is safe and often beneficial โ it rewrites manifest files for planning efficiency and creates a new snapshot (data remains unchanged). On managed Iceberg, UC periodically optimizes metadata for you; consider manual rewrites for external tables or after heavy streaming/append workloads that produce many small manifests.
Practical tips (when you run it yourself): target specific large or fragmented manifests instead of rewriting all; avoid Spark executor memory pressure by disabling aggressive caching during the operation (client-dependent).
On managed Iceberg: rely on UCโs automated maintenance; override manually only for special cases or compliance windows.
On external/foreign Iceberg:
Cheers, Louis.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now