cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog - Created UC and linked it to my DEV storage account for the entire org

Daalip808
New Contributor

Hello everyone,

I was lead in a data platform modernization project. This was my first time administrating databricks and I got myself into quite the situation. Essentially i made the mistake of linking our enterprise wide Unity Catalog to our DEV Azure storage account. Meaning all catalogs created going forward will be stored in this dev storage account that is specifically created for an individual project.

My goal is to move towards a databricks managed storage account so that there wont be an storage account for us to manage. I know that there is no way to remove the storage account from the UC, therefore I would have to delete and recreate the catalog. This would lead to us losing all of our metadata in our current UC.

Our current setup looks like this: 3 environments (dev,uat.prod). Each environment has its own dedicated databricks instance an azure data lake gen2 storage account. All of the UC tables are stored in the storage accounts so the customer data will remain after UC deletion. My concerns are losing all of the metadata and permissions that we have setup this far.

I would like to understand what are my options here and if my thought process is correct. I believe that there is no way to do a deep clone to another UC in a secondary region, which would retain metadata, and then deep clone back to the new UC once stood up. But please correct me if I'm wrong.

If I need to manually re-create the tables via script and re-link to the storage location I believe I would lose all of its metadata such as history. I would then have to re-create groups and users to reassign to catalogs/shemas/tables. However, my research thus far shows this as the only option.

In short, i would like to know the best route for backing up and restoring the Unity Catalog in my current situation. 

1 REPLY 1

Kaniz
Community Manager
Community Manager
 

Hi @Daalip808, Managing the Unity Catalog in Azure Databricks is crucial for data governance and organization.

Let’s explore some best practices and potential options for backing up and restoring your Unity Catalog in your current situation.

  1. Unity Catalog Best Practices:

    • Data Isolation with Catalogs: Unity Catalog provides a fine-grained governance solution for data and AI on the Databricks Platform. It helps simplify security and governance by providing a central place to administer and audit data access. Catalogs are the highest level in the data hierarchy managed by the Unity Catalog metastore. They represent a logical grouping of schemas, usually bounded by data access requirements. You can create catalogs for different purposes, such as production data, development data, or sensit...1.
    • Configure External Locations and Storage Credentials: When setting up your Unity Catalog, ensure that you configure external locations (such as Azure Data Lake Gen2 storage accounts) and storage credentials. This allows the catalog to manage data assets (tables, views, and volumes) and their associated permissions effectively.
    • Leverage Cluster Configurations: Unity Catalog integrates with Databricks clusters. Consider leveraging cluster configurations to optimize performance and resource utilization for your data processing workloads.
    • Audit Logs and Monitoring: Enable audit logs to track changes and access to your Unity Catalog. Monitoring these logs helps you maintain data governance and security.
    • Delta Sharing: If you need to share data in Azure Databricks with users outside your organization, consider using Delta Sharing. It uses Unity Catalog to manage and audit sharing behavior1.
  2. Options for Backing Up and Restoring Unity Catalog:

    • Manual Recreation:
      • As you mentioned, manually re-creating tables via scripts and re-linking them to the storage location is one option. However, this approach would indeed result in losing metadata such as history, permissions, and other settings.
      • You would need to recreate groups and users and reassign them to catalogs, schemas, and tables.
    • Deep Clone to Another UC (Secondary Region):
      • Unfortunately, there is no direct built-in feature for deep cloning Unity Catalogs from one region to another while retaining metadata.
      • However, you could explore creating a new Unity Catalog in a secondary region and manually transferring the necessary metadata (such as table definitions, views, and permissions) from the existing catalog to the new one.
      • This process would involve scripting and careful migration to ensure data consistency.
    • Consider a Hybrid Approach:
      • If you have critical metadata that you cannot afford to lose, consider a hybrid approach:
        1. Backup Existing Catalog: Export relevant metadata (e.g., table definitions, views, permissions) from your current Unity Catalog.
        2. Create New Unity Catalog: Set up a new Unity Catalog in the desired configuration (e.g., using Databricks-managed storage).
        3. Restore Metadata: Manually restore the exported metadata to the new catalog.
      • While this approach requires effort, it allows you to retain critical information while transitioning to a new storage setup.

Remember to thoroughly test any approach you choose in a non-production environment before applying it to your live system. Good luck with your data platform modernization project! 😊🚀12.

I’ve provided recommendations based on best practices and potential options for backing up and restoring your Unity Catalog. If you need further assistance or have additional questions, feel free to ask!