Databricks Community

ck7007 · ‎09-07-2025

Zero-Downtime Unity Catalog Migration for 500TB Data Lake

Just completed migrating 500TB to Unity Catalog without a single minute of downtime. Here's how:

The Challenge

500 TB across 12,000 tables
200+ concurrent users
Zero tolerance for downtime
Mixed Hive and Delta tables

The Solution: Parallel Sync Strategy

Step 1: Shadow Catalog Setup
def create_shadow_catalog(source_db, target_catalog):
"""
Creates a UC catalog that shadows the existing Hive metastore
"""
tables = spark.catalog.listTables(source_db)

for table in tables:
# Create external table pointing to same location
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {target_catalog}.{source_db}.{table.name}
USING DELTA
LOCATION '{get_table_location(source_db, table.name)}'
"""

# Sync permissions
sync_table_permissions(source_db, table.name, target_catalog)

Step 2: Dual-Write Pattern
class DualWriter:
"""
Writes to both Hive and UC during transition
"""
def write_data(self, df, table_name):
# Write to original Hive table
df.write.mode("append").saveAsTable(f"hive_metastore.{table_name}")

# Simultaneously write to UC
df.write.mode("append").saveAsTable(f"main.prod.{table_name}")

# Verify consistency
assert verify_row_counts(f"hive_metastore.{table_name}",
f"main.prod.{table_name}")

Step 3: Smart Query Router
def route_query(query, user_group):
"""
Gradually routes traffic to UC
"""
migration_percentage = get _migration_percentage(user_group)

if random.random() < migration_percentage:
# Route to Unity Catalog
return query.replace("hive_metastore.", "main.prod.")
else:
# Keep on Hive
return query

Results

Migration time: 3 weeks (running in the background)
Downtime: ZERO
Failed queries: 0.01% (auto-retried)
Performance gain: 23% faster queries post-migration

Key Lessons

Never use "DEEP CLONE" for large tables—too slow
External tables are your friend—same data, different metadata
Test with read-only users first—lower risk
Monitor table drift—catches issues early

Rollback Strategy (Saved Us Twice!)

# Instant rollback if issues are detected in
spark.conf.set("spark.sql.catalog.default", "hive_metastore")
# All queries automatically revert to Hive

Anyone else doing UC migration? What patterns worked for you?

P.S. created a full migration toolkit—DM if interested!

ck7007 · ‎09-08-2025

Thanks, @Khaja_Zaffer and @BS_THE_ANALYST!

@Khaja_Zaffer:The toolkit has 5 main components:

Pre-migration analyzer Compatibility scoring
Drift monitor Real time consistency checks
Permission migrator: Automated ACL copying
Query rewriter: Hive→UC SQL converter
Rollback orchestrator: one-click recovery

Sending you the GitHub link via DM!

@BS_THE_ANALYST: Excellent observation! This isn't typical; most use downtime. The 3 weeks with zero stale data worked because:

# Dual-write pattern every transaction hits both systems
write_to_hive(df) + write_to_uc(df)  # Atomic operation

Week 1: Historical sync
Weeks 2-3: Dual writes (keeping data fresh) + gradual user migration

The dual-write ensures data is ALWAYS current in both systems. No catch-up needed!

You're right that downtime is simpler (one snapshot, done), but for 24/7 operations, this complexity pays off.

Happy to dive deeper into any specific aspect!

View solution in original post

Khaja_Zaffer · ‎09-07-2025

Hey @ck7007

Thats a good hands on project! what kind of toolkit do you have sir

BS_THE_ANALYST · ‎09-07-2025

@ck7007 I'd love to checkout the toolkit aswell & thanks for sharing this strategy.

I'm curious, is this a typical migration to Unity Catalog? ☺️.

Also, with it taking 3 weeks, there must be some chance of data not being the "latest"? 🤔. Curious if that's the benefit of the downtime is that you can get all the latest over?

All the best,
BS

Khaja_Zaffer · ‎09-08-2025

Hello @ck7007

I want to do real time project

Waiting for the dm

ck7007 · ‎09-08-2025

Thanks, @Khaja_Zaffer and @BS_THE_ANALYST!

@Khaja_Zaffer:The toolkit has 5 main components:

Pre-migration analyzer Compatibility scoring
Drift monitor Real time consistency checks
Permission migrator: Automated ACL copying
Query rewriter: Hive→UC SQL converter
Rollback orchestrator: one-click recovery

Sending you the GitHub link via DM!

@BS_THE_ANALYST: Excellent observation! This isn't typical; most use downtime. The 3 weeks with zero stale data worked because:

# Dual-write pattern every transaction hits both systems
write_to_hive(df) + write_to_uc(df)  # Atomic operation

Week 1: Historical sync
Weeks 2-3: Dual writes (keeping data fresh) + gradual user migration

The dual-write ensures data is ALWAYS current in both systems. No catch-up needed!

You're right that downtime is simpler (one snapshot, done), but for 24/7 operations, this complexity pays off.

Happy to dive deeper into any specific aspect!

Databricks Community

Unity Catalog Migration Strategy

Zero-Downtime Unity Catalog Migration for 500TB Data Lake

The Challenge

The Solution: Parallel Sync Strategy

Results

Key Lessons

Rollback Strategy (Saved Us Twice!)

# Instant rollback if issues are detected in
spark.conf.set("spark.sql.catalog.default", "hive_metastore")
# All queries automatically revert to Hive

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Databricks Community

Unity Catalog Migration Strategy

Zero-Downtime Unity Catalog Migration for 500TB Data Lake

The Challenge

The Solution: Parallel Sync Strategy

Results

Key Lessons

Rollback Strategy (Saved Us Twice!)# Instant rollback if issues are detected in spark.conf.set("spark.sql.catalog.default", "hive_metastore")# All queries automatically revert to Hive

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Rollback Strategy (Saved Us Twice!)

# Instant rollback if issues are detected in
spark.conf.set("spark.sql.catalog.default", "hive_metastore")
# All queries automatically revert to Hive