cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

making REORG TABLE to enable Iceberg Uniform more efficient and faster

JameDavi_51481
Contributor

I am upgrading a large number of tables for Iceberg / Uniform compatibility by running

 
REORG TABLE <tablename> APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));
and finding that some tables take several hours to upgrade - presumably because they are re-writing all the parquet files, rather than just generating the Iceberg metadata.
 
Are there any configurations I can use to make this process more efficient? Or any guidelines for optimizing cluster parameters (number of nodes, size, etc) to make these sorts of operations faster? I'm not keen on spending a few weeks to get Iceberg working on my tables.
1 REPLY 1

sridharplv
Valued Contributor II

HI @JameDavi_51481 , Hope you tried this approach for enabling iceberg metadata along with delta format :

ALTER TABLE internal_poc_iceberg.iceberg_poc.clickstream_gold_sink_dlt
SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
); 

Please let me know otherwise. If you used it and still looking for fast reorg with complete rewrite you can tweak cluster settings or configuration to make it faster.

1. Use a cluster with high parallelism

  • Use a larger cluster (more worker nodes) with:

    • High I/O throughput (EBS-optimized in AWS, or Premium SSD in Azure)

    • High memory-to-core ratio (e.g., i3, r5d, m5d instances in AWS)

  • Try using photon-enabled clusters if available — Photon often improves performance of IO-heavy workloads.

2. Run upgrades in parallel

If you're upgrading multiple tables, batch them in parallel using job clusters or workflows.

3. Use autoscaling job clusters

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now