<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unity Catalog Migration Strategy in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131313#M668</link>
    <description>&lt;P class=""&gt;Thanks, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/173840"&gt;@Khaja_Zaffer&lt;/a&gt; and &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;!&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/173840"&gt;@Khaja_Zaffer&lt;/a&gt;&lt;/STRONG&gt;:The toolkit has 5 main components:&lt;/P&gt;&lt;OL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Pre-migration analyzer&lt;/STRONG&gt;&amp;nbsp; Compatibility scoring&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Drift monitor&lt;/STRONG&gt;&amp;nbsp;Real time consistency checks&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Permission migrator: Automated&lt;/STRONG&gt;&amp;nbsp;ACL copying&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Query rewriter: Hive→UC&lt;/STRONG&gt;&amp;nbsp;SQL converter&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Rollback orchestrator: one-click&lt;/STRONG&gt;&amp;nbsp;recovery&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;Sending you the GitHub link via DM!&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;: &lt;/STRONG&gt;Excellent observation! This isn't typical; most use downtime. The 3 weeks with zero stale data worked because:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;# Dual-write pattern every transaction hits both systems&lt;/SPAN&gt;
&lt;/SPAN&gt;&lt;SPAN&gt;write_to_hive&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;+&lt;/SPAN&gt; write_to_uc&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;)&lt;/SPAN&gt;  &lt;SPAN class=""&gt;# Atomic operation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;UL class=""&gt;&lt;LI&gt;Week 1: Historical sync&lt;/LI&gt;&lt;LI&gt;Weeks 2-3: Dual writes (keeping data fresh) + gradual user migration&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;The dual-write ensures data is ALWAYS current in both systems. No catch-up needed!&lt;/P&gt;&lt;P class=""&gt;You're right that downtime is simpler (one snapshot, done), but for 24/7 operations, this complexity pays off.&lt;/P&gt;&lt;P class=""&gt;Happy to dive deeper into any specific aspect!&lt;/P&gt;</description>
    <pubDate>Mon, 08 Sep 2025 22:44:52 GMT</pubDate>
    <dc:creator>ck7007</dc:creator>
    <dc:date>2025-09-08T22:44:52Z</dc:date>
    <item>
      <title>Unity Catalog Migration Strategy</title>
      <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131155#M665</link>
      <description>&lt;H1&gt;Zero-Downtime Unity Catalog Migration for 500TB Data Lake&lt;/H1&gt;&lt;P class=""&gt;Just completed migrating 500TB to Unity Catalog without a single minute of downtime. Here's how:&lt;/P&gt;&lt;H2&gt;The Challenge&lt;/H2&gt;&lt;UL class=""&gt;&lt;LI&gt;500 TB across 12,000 tables&lt;/LI&gt;&lt;LI&gt;200+ concurrent users&lt;/LI&gt;&lt;LI&gt;Zero tolerance for downtime&lt;/LI&gt;&lt;LI&gt;Mixed Hive and Delta tables&lt;/LI&gt;&lt;/UL&gt;&lt;H2&gt;The Solution: Parallel Sync Strategy&lt;/H2&gt;&lt;H3&gt;&lt;STRONG&gt;Step 1: Shadow Catalog Setup&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;def create_shadow_catalog(source_db, target_catalog):&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Creates a UC catalog that shadows the existing Hive metastore&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;tables = spark.catalog.listTables(source_db)&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;for table in tables:&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;# Create external table pointing to same location&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;spark.sql(f"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;CREATE TABLE IF NOT EXISTS {target_catalog}.{source_db}.{table.name}&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;USING DELTA&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;LOCATION '{get_table_location(source_db, table.name)}'&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;# Sync permissions&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;sync_table_permissions(source_db, table.name, target_catalog)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Step 2: Dual-Write Pattern&lt;BR /&gt;class DualWriter:&lt;BR /&gt;"""&lt;BR /&gt;Writes to both Hive and UC during transition&lt;BR /&gt;"""&lt;BR /&gt;def write_data(self, df, table_name):&lt;BR /&gt;# Write to original Hive table&lt;BR /&gt;df.write.mode("append").saveAsTable(f"hive_metastore.{table_name}")&lt;BR /&gt;&lt;BR /&gt;# Simultaneously write to UC&lt;BR /&gt;df.write.mode("append").saveAsTable(f"main.prod.{table_name}")&lt;BR /&gt;&lt;BR /&gt;# Verify consistency&lt;BR /&gt;assert verify_row_counts(f"hive_metastore.{table_name}",&lt;BR /&gt;f"main.prod.{table_name}")&lt;BR /&gt;&lt;BR /&gt;Step 3: Smart Query Router&lt;BR /&gt;&lt;/STRONG&gt;def route_query(query, user_group):&lt;BR /&gt;"""&lt;BR /&gt;Gradually routes traffic to UC&lt;BR /&gt;"""&lt;BR /&gt;migration_percentage = get _migration_percentage(user_group)&lt;BR /&gt;&lt;BR /&gt;if random.random() &amp;lt; migration_percentage:&lt;BR /&gt;# Route to Unity Catalog&lt;BR /&gt;return query.replace("hive_metastore.", "main.prod.")&lt;BR /&gt;else:&lt;BR /&gt;# Keep on Hive&lt;BR /&gt;return query&lt;BR /&gt;&lt;BR /&gt;&lt;/H3&gt;&lt;H2&gt;Results&lt;/H2&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Migration time:&lt;/STRONG&gt; 3 weeks (running in the background)&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Downtime:&lt;/STRONG&gt; ZERO&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Failed queries:&lt;/STRONG&gt; 0.01% (auto-retried)&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Performance gain:&lt;/STRONG&gt; 23% faster queries post-migration&lt;/LI&gt;&lt;/UL&gt;&lt;H2&gt;Key Lessons&lt;/H2&gt;&lt;OL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Never use "DEEP CLONE"&lt;/STRONG&gt; for large tables—too slow&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;External tables are your friend&lt;/STRONG&gt;—same data, different metadata&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Test with read-only users first&lt;/STRONG&gt;—lower risk&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Monitor table drift&lt;/STRONG&gt;—catches issues early&lt;/LI&gt;&lt;/OL&gt;&lt;H2&gt;Rollback Strategy (Saved Us Twice!)&lt;BR /&gt;&lt;BR /&gt;# Instant rollback if issues are detected in&amp;nbsp;&lt;BR /&gt;spark.conf.set("spark.sql.catalog.default", "hive_metastore")&lt;BR /&gt;# All queries automatically revert to Hive&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/H2&gt;&lt;P class=""&gt;Anyone else doing UC migration? What patterns worked for you?&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;P.S.&lt;/STRONG&gt; created a full migration toolkit—DM if interested!&lt;/P&gt;</description>
      <pubDate>Sun, 07 Sep 2025 07:01:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131155#M665</guid>
      <dc:creator>ck7007</dc:creator>
      <dc:date>2025-09-07T07:01:30Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Migration Strategy</title>
      <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131163#M666</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/180185"&gt;@ck7007&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thats a good hands on project!&amp;nbsp; what kind of toolkit do you have sir&lt;/P&gt;</description>
      <pubDate>Sun, 07 Sep 2025 10:27:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131163#M666</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-09-07T10:27:04Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Migration Strategy</title>
      <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131171#M667</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/180185"&gt;@ck7007&lt;/a&gt;&amp;nbsp;I'd love to checkout the toolkit aswell &amp;amp; thanks for sharing this strategy.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I'm curious, is this a typical migration to Unity Catalog? ☺️.&lt;BR /&gt;&lt;BR /&gt;Also, with it taking 3 weeks, there must be some chance of data not being the "latest"? &lt;span class="lia-unicode-emoji" title=":thinking_face:"&gt;🤔&lt;/span&gt;. Curious if that's the benefit of the downtime is that you can get all the latest over?&lt;BR /&gt;&lt;BR /&gt;All the best,&lt;BR /&gt;BS&lt;/P&gt;</description>
      <pubDate>Sun, 07 Sep 2025 14:11:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131171#M667</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-09-07T14:11:30Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Migration Strategy</title>
      <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131313#M668</link>
      <description>&lt;P class=""&gt;Thanks, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/173840"&gt;@Khaja_Zaffer&lt;/a&gt; and &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;!&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/173840"&gt;@Khaja_Zaffer&lt;/a&gt;&lt;/STRONG&gt;:The toolkit has 5 main components:&lt;/P&gt;&lt;OL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Pre-migration analyzer&lt;/STRONG&gt;&amp;nbsp; Compatibility scoring&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Drift monitor&lt;/STRONG&gt;&amp;nbsp;Real time consistency checks&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Permission migrator: Automated&lt;/STRONG&gt;&amp;nbsp;ACL copying&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Query rewriter: Hive→UC&lt;/STRONG&gt;&amp;nbsp;SQL converter&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Rollback orchestrator: one-click&lt;/STRONG&gt;&amp;nbsp;recovery&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;Sending you the GitHub link via DM!&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;: &lt;/STRONG&gt;Excellent observation! This isn't typical; most use downtime. The 3 weeks with zero stale data worked because:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;# Dual-write pattern every transaction hits both systems&lt;/SPAN&gt;
&lt;/SPAN&gt;&lt;SPAN&gt;write_to_hive&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;+&lt;/SPAN&gt; write_to_uc&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;)&lt;/SPAN&gt;  &lt;SPAN class=""&gt;# Atomic operation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;UL class=""&gt;&lt;LI&gt;Week 1: Historical sync&lt;/LI&gt;&lt;LI&gt;Weeks 2-3: Dual writes (keeping data fresh) + gradual user migration&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;The dual-write ensures data is ALWAYS current in both systems. No catch-up needed!&lt;/P&gt;&lt;P class=""&gt;You're right that downtime is simpler (one snapshot, done), but for 24/7 operations, this complexity pays off.&lt;/P&gt;&lt;P class=""&gt;Happy to dive deeper into any specific aspect!&lt;/P&gt;</description>
      <pubDate>Mon, 08 Sep 2025 22:44:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131313#M668</guid>
      <dc:creator>ck7007</dc:creator>
      <dc:date>2025-09-08T22:44:52Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Migration Strategy</title>
      <link>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131315#M669</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/180185"&gt;@ck7007&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to do real time project&amp;nbsp;&lt;/P&gt;&lt;P&gt;Waiting for the dm&lt;/P&gt;</description>
      <pubDate>Mon, 08 Sep 2025 23:46:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/unity-catalog-migration-strategy/m-p/131315#M669</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-09-08T23:46:34Z</dc:date>
    </item>
  </channel>
</rss>

