cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Consideration Before Migrating Hive Tables to Unity Catalog

ChsAIkrishna
New Contributor II

Databricks recommends four methods to migrate Hive tables to Unity Catalog, each with its pros and cons. The choice of method depends on specific requirements.

  1. SYNC: A SQL command that migrates schema or tables to Unity Catalog external tables. However, tracking multiple tables (e.g., 1000s) can be challenging on tracking the migration.
  2. CLONE: A SQL command that performs a deep clone, migrating Hive-managed tables to Unity Catalog-managed tables. This method allows for individual table execution, making it a suggested approach.
  3. UCX: A command-based tool currently in Databricks Labs, not fully approved for production use. While it may work for some applications, it's not recommended for production migration. –it’s my thought maybe you can use for migration
  4. Unity Catalog Upgrade Wizard (UI-based tool): A IU based tool for quickly upgrading Hive tables to Unity Catalog external tables. However, it's not suitable for large-scale production migrations.
  5. Using the SYNC and CLONE create your own databricks workflow / api: I know finally all will ends up with this method.

 

What I am thinking  ?

highresrollsafe

 Migrating to Unity Catalog is not just a simple table object recreation from Hive to Unity Catalog. Instead of that take it's an opportunity to create a more robust and efficient data lake by leveraging Databricks' advanced features, such as liquid clustering, SQL Warehouse, Delta Live Tables, governance, security, Delta Sharing, and Databricks Asset Bundles etc.

When you initially built your application on Hive tables, Databricks may not have been as advanced, and your application created the data pipelines using Notebooks, ADF, and complex Scala jars. Additionally, you may not have been following software engineering best practices, such as test coverage, continuous integration, and continuous deployment (CI/CD).

It's a known fact - 2-3 years ago, you may not have been a SME in Databricks, but now you have gained a deeper understanding of the tool and its features, as well as hands-on experience with your application, platform and cloud services, which has changed your perspective and approach to using Databricks.

If we migrate the hive table to UC and uses the same legacy data pipelines is like

"Same salted recipe, but plate changed" – if the recipe is same taste also same.

piz.PNGKick out the notebooks deployment and Scala code (sorry, Scala devs!) and instead write your code in DLT, Python, or dbt (hello, SQL devs!). Utilize Databricks asset bundles to consolidate your entire application infrastructure in one place. Establish uniform code patterns and sustainable practices, leveraging features and DLT's data quality capabilities to design robust data pipelines.

 

Conclusion note:

Take the opportunity of UC migration refactor your data lake application.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group