cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Deep Clone

SenthilJ
New Contributor III

Hi,

I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. primary and secondary. In my design, active(live) Databricks setup will be hosted in the primary region with its own metastore. A similar setup will be done in the secondary region for the passive instance.

In this case, does Databricks Deep Clone offers cloning of UC objects across two different metastores hosted in primary and secondary regions, one each per region ? If not, is there an alternative to make it work to meet this DR objective? 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permissions, tags, and comments.

    • However, let’s address your specific scenario: active (live) Databricks setup in the primary region with its own metastore and a similar setup in the secondary region for the passive instance.
    • As of my last knowledge update, Deep Clone operates within the same metastore. It does not inherently support cloning UC objects across two different metastores hosted in separate regions.
    • In other words, if you have distinct metastores—one per region—you cannot directly use Deep Clone to synchronize UC objects between them.
      • To achieve your DR objective, consider an alternative approach:
        • Automated Cloning Script: Create a custom cloning script that handles the migration of UC objects across metastores. This script should:
          • Create a new catalog in the secondary region with the desired storage location.
          • Incrementally clone UC objects (schemas, tables, permissions, etc.) from the primary region’s catalog to the secondary region’s catalog.
          • Ensure consistency and integrity during the process.
        • Scheduled Execution: Schedule the script to run periodically or as needed to keep the secondary region’s catalog up-to-date.
        • Testing and Validation: Thoroughly test the script to validate its correctness and reliability.
    • Data Movement: Apart from UC objects, consider how data movement (tables, files, etc.) will be handled between the primary and secondary regions.
    • Network Latency: Account for network latency and bandwidth constraints when synchronizing data across regions.

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @SenthilJ, The recommendation from Databricks to use Deep Clone for cloning Unity Catalog (UC) tables is indeed a prudent approach. Deep Clone facilitates the seamless replication of UC objects, including schemas, managed tables, access permissions, tags, and comments.

    • However, let’s address your specific scenario: active (live) Databricks setup in the primary region with its own metastore and a similar setup in the secondary region for the passive instance.
    • As of my last knowledge update, Deep Clone operates within the same metastore. It does not inherently support cloning UC objects across two different metastores hosted in separate regions.
    • In other words, if you have distinct metastores—one per region—you cannot directly use Deep Clone to synchronize UC objects between them.
      • To achieve your DR objective, consider an alternative approach:
        • Automated Cloning Script: Create a custom cloning script that handles the migration of UC objects across metastores. This script should:
          • Create a new catalog in the secondary region with the desired storage location.
          • Incrementally clone UC objects (schemas, tables, permissions, etc.) from the primary region’s catalog to the secondary region’s catalog.
          • Ensure consistency and integrity during the process.
        • Scheduled Execution: Schedule the script to run periodically or as needed to keep the secondary region’s catalog up-to-date.
        • Testing and Validation: Thoroughly test the script to validate its correctness and reliability.
    • Data Movement: Apart from UC objects, consider how data movement (tables, files, etc.) will be handled between the primary and secondary regions.
    • Network Latency: Account for network latency and bandwidth constraints when synchronizing data across regions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group