Databricks Community

SenthilJ · ‎03-31-2024

Hi,

I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. primary and secondary. In my design, active(live) Databricks setup will be hosted in the primary region with its own metastore. A similar setup will be done in the secondary region for the passive instance.

In this case, does Databricks Deep Clone offers cloning of UC objects across two different metastores hosted in primary and secondary regions, one each per region ? If not, is there an alternative to make it work to meet this DR objective?

phanisub · ‎06-22-2025

@SenthilJ - May I know if you got any responses or support offline to get this activity done?

Isi · ‎06-22-2025

Hi,

In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore context, so this approach won’t work out of the box for your DR strategy across primary and secondary regions.

That said, here are a few alternative approaches you could consider for achieving your DR objective:

1. Delta Sharing between metastores

You could use Delta Sharing to expose the source tables from the primary region and then recreate or hydrate them in the secondary region. Delta Sharing supports cross-account and cross-region sharing, even across clouds.

However, it’s worth noting that Delta Sharing is optimized for data access and interoperability, not necessarily for high-throughput replication, and performance can be a concern — especially for large or frequently changing tables.

2. File-level replication (e.g., AzCopy, Azure Data Factory)

Another robust approach is to replicate the underlying Delta Lake files using tools like AzCopy or Azure Data Factory, similar to what AWS DataSync provides.

This method is:

Cost-effective
Cross-account and cross-region
Storage-native (no Databricks compute required during transfer)

Once the data is in the target region’s storage account, you can register the tables manually (or via automation) in the secondary Unity Catalog metastore. This essentially gives you a snapshot of the latest state of your tables.

3. Snapshots + Restore

If you’re using ADLS Gen2 with versioning or backup policies, you can take advantage of storage-level snapshots. In a DR event, you could restore those snapshots into a separate container or region and then rehydrate the tables in Databricks.

This method is slower in terms of RTO but can serve as a last-resort recovery strategy.

Hope this helps, 🙂

Isi