Databricks Community

JameDavi_51481 · ‎07-12-2025

I am upgrading a large number of tables for Iceberg / Uniform compatibility by running

REORG TABLE <tablename> APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));

and finding that some tables take several hours to upgrade - presumably because they are re-writing all the parquet files, rather than just generating the Iceberg metadata.

Are there any configurations I can use to make this process more efficient? Or any guidelines for optimizing cluster parameters (number of nodes, size, etc) to make these sorts of operations faster? I'm not keen on spending a few weeks to get Iceberg working on my tables.

sridharplv · ‎07-12-2025

HI @JameDavi_51481 , Hope you tried this approach for enabling iceberg metadata along with delta format :

ALTER TABLE internal_poc_iceberg.iceberg_poc.clickstream_gold_sink_dlt
SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);

Please let me know otherwise. If you used it and still looking for fast reorg with complete rewrite you can tweak cluster settings or configuration to make it faster.

1. Use a cluster with high parallelism

Use a larger cluster (more worker nodes) with:
- High I/O throughput (EBS-optimized in AWS, or Premium SSD in Azure)
- High memory-to-core ratio (e.g., i3, r5d, m5d instances in AWS)
Try using photon-enabled clusters if available — Photon often improves performance of IO-heavy workloads.

2. Run upgrades in parallel

If you're upgrading multiple tables, batch them in parallel using job clusters or workflows.

Databricks Community

making REORG TABLE to enable Iceberg Uniform more efficient and faster

1. Use a cluster with high parallelism

2. Run upgrades in parallel

3. Use autoscaling job clusters

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples