<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic making REORG TABLE to enable Iceberg Uniform more efficient and faster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/making-reorg-table-to-enable-iceberg-uniform-more-efficient-and/m-p/125017#M47320</link>
    <description>&lt;P&gt;I am upgrading a large number of tables for Iceberg / Uniform compatibility by running&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;REORG TABLE &amp;lt;tablename&amp;gt; APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));&lt;/LI-CODE&gt;&lt;DIV class=""&gt;and finding that some tables take several hours to upgrade - presumably because they are re-writing all the parquet files, rather than just generating the Iceberg metadata.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Are there any configurations I can use to make this process more efficient? Or any guidelines for optimizing cluster parameters (number of nodes, size, etc) to make these sorts of operations faster? I'm not keen on spending a few weeks to get Iceberg working on my tables.&lt;/DIV&gt;</description>
    <pubDate>Sat, 12 Jul 2025 18:48:54 GMT</pubDate>
    <dc:creator>JameDavi_51481</dc:creator>
    <dc:date>2025-07-12T18:48:54Z</dc:date>
    <item>
      <title>making REORG TABLE to enable Iceberg Uniform more efficient and faster</title>
      <link>https://community.databricks.com/t5/data-engineering/making-reorg-table-to-enable-iceberg-uniform-more-efficient-and/m-p/125017#M47320</link>
      <description>&lt;P&gt;I am upgrading a large number of tables for Iceberg / Uniform compatibility by running&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;REORG TABLE &amp;lt;tablename&amp;gt; APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));&lt;/LI-CODE&gt;&lt;DIV class=""&gt;and finding that some tables take several hours to upgrade - presumably because they are re-writing all the parquet files, rather than just generating the Iceberg metadata.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Are there any configurations I can use to make this process more efficient? Or any guidelines for optimizing cluster parameters (number of nodes, size, etc) to make these sorts of operations faster? I'm not keen on spending a few weeks to get Iceberg working on my tables.&lt;/DIV&gt;</description>
      <pubDate>Sat, 12 Jul 2025 18:48:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/making-reorg-table-to-enable-iceberg-uniform-more-efficient-and/m-p/125017#M47320</guid>
      <dc:creator>JameDavi_51481</dc:creator>
      <dc:date>2025-07-12T18:48:54Z</dc:date>
    </item>
    <item>
      <title>Re: making REORG TABLE to enable Iceberg Uniform more efficient and faster</title>
      <link>https://community.databricks.com/t5/data-engineering/making-reorg-table-to-enable-iceberg-uniform-more-efficient-and/m-p/125020#M47322</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/37890"&gt;@JameDavi_51481&lt;/a&gt;&amp;nbsp;, Hope you tried this approach for enabling iceberg metadata along with delta format :&lt;/P&gt;&lt;P&gt;ALTER TABLE internal_poc_iceberg.iceberg_poc.clickstream_gold_sink_dlt&lt;BR /&gt;SET TBLPROPERTIES (&lt;BR /&gt;'delta.columnMapping.mode' = 'name',&lt;BR /&gt;'delta.enableIcebergCompatV2' = 'true',&lt;BR /&gt;'delta.universalFormat.enabledFormats' = 'iceberg'&lt;BR /&gt;);&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know otherwise. If you used it and still looking for fast reorg with complete rewrite you can tweak cluster settings or configuration to make it faster.&lt;/P&gt;&lt;H4&gt;1. &lt;STRONG&gt;Use a cluster with high parallelism&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use a &lt;STRONG&gt;larger cluster&lt;/STRONG&gt; (more worker nodes) with:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;High I/O throughput (EBS-optimized in AWS, or Premium SSD in Azure)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;High memory-to-core ratio (e.g., i3, r5d, m5d instances in AWS)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Try using &lt;STRONG&gt;photon-enabled&lt;/STRONG&gt; clusters if available — Photon often improves performance of IO-heavy workloads.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;2. &lt;STRONG&gt;Run upgrades in parallel&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;If you're upgrading multiple tables, &lt;STRONG&gt;batch them in parallel&lt;/STRONG&gt; using job clusters or workflows.&lt;/P&gt;&lt;H4&gt;3. &lt;STRONG&gt;Use autoscaling job clusters&lt;/STRONG&gt;&lt;/H4&gt;</description>
      <pubDate>Sat, 12 Jul 2025 19:08:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/making-reorg-table-to-enable-iceberg-uniform-more-efficient-and/m-p/125020#M47322</guid>
      <dc:creator>sridharplv</dc:creator>
      <dc:date>2025-07-12T19:08:10Z</dc:date>
    </item>
  </channel>
</rss>

