cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Deep Clone

SenthilJ
New Contributor III

Hi,

I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. primary and secondary. In my design, active(live) Databricks setup will be hosted in the primary region with its own metastore. A similar setup will be done in the secondary region for the passive instance.

In this case, does Databricks Deep Clone offers cloning of UC objects across two different metastores hosted in primary and secondary regions, one each per region ? If not, is there an alternative to make it work to meet this DR objective? 

2 REPLIES 2

phanisub
New Contributor II

@SenthilJ - May I know if you got any responses or support offline to get this activity done?

Isi
Honored Contributor II

Hi,

In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore context, so this approach wonโ€™t work out of the box for your DR strategy across primary and secondary regions.

That said, here are a few alternative approaches you could consider for achieving your DR objective:

1. Delta Sharing between metastores

You could use Delta Sharing to expose the source tables from the primary region and then recreate or hydrate them in the secondary region. Delta Sharing supports cross-account and cross-region sharing, even across clouds.

However, itโ€™s worth noting that Delta Sharing is optimized for data access and interoperability, not necessarily for high-throughput replication, and performance can be a concern โ€” especially for large or frequently changing tables.

2. File-level replication (e.g., AzCopy, Azure Data Factory)

Another robust approach is to replicate the underlying Delta Lake files using tools like AzCopy or Azure Data Factory, similar to what AWS DataSync provides.

This method is:

  • Cost-effective

  • Cross-account and cross-region

  • Storage-native (no Databricks compute required during transfer)

 

Once the data is in the target regionโ€™s storage account, you can register the tables manually (or via automation) in the secondary Unity Catalog metastore. This essentially gives you a snapshot of the latest state of your tables.

 

3. Snapshots + Restore

If youโ€™re using ADLS Gen2 with versioning or backup policies, you can take advantage of storage-level snapshots. In a DR event, you could restore those snapshots into a separate container or region and then rehydrate the tables in Databricks.

This method is slower in terms of RTO but can serve as a last-resort recovery strategy.

 

Hope this helps, ๐Ÿ™‚

Isi

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now