<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks Deep Clone in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/65151#M32747</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. primary and secondary. In my design, active(live) Databricks setup will be hosted in the primary region with its own metastore. A similar setup will be done in the secondary region for the passive instance.&lt;/P&gt;&lt;P&gt;In this case, does Databricks Deep Clone offers cloning of UC objects across two different metastores hosted in primary and secondary regions, one each per region ? If not, is there an alternative to make it work to meet this DR objective?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 01 Apr 2024 06:30:39 GMT</pubDate>
    <dc:creator>SenthilJ</dc:creator>
    <dc:date>2024-04-01T06:30:39Z</dc:date>
    <item>
      <title>Databricks Deep Clone</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/65151#M32747</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. primary and secondary. In my design, active(live) Databricks setup will be hosted in the primary region with its own metastore. A similar setup will be done in the secondary region for the passive instance.&lt;/P&gt;&lt;P&gt;In this case, does Databricks Deep Clone offers cloning of UC objects across two different metastores hosted in primary and secondary regions, one each per region ? If not, is there an alternative to make it work to meet this DR objective?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Apr 2024 06:30:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/65151#M32747</guid>
      <dc:creator>SenthilJ</dc:creator>
      <dc:date>2024-04-01T06:30:39Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Deep Clone</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/122453#M46777</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/57225"&gt;@SenthilJ&lt;/a&gt; - May I know if you got any responses or support offline to get this activity done?&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jun 2025 07:44:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/122453#M46777</guid>
      <dc:creator>phanisub</dc:creator>
      <dc:date>2025-06-22T07:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Deep Clone</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/122457#M46779</link>
      <description>&lt;P class=""&gt;Hi,&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores&lt;/SPAN&gt;&amp;nbsp;(each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore context, so this approach won’t work out of the box for your DR strategy across primary and secondary regions.&lt;/P&gt;&lt;P class=""&gt;That said, here are a few &lt;SPAN class=""&gt;alternative approaches&lt;/SPAN&gt; you could consider for achieving your DR objective:&lt;/P&gt;&lt;H3&gt;&lt;STRONG&gt;1. &lt;/STRONG&gt;&lt;STRONG&gt;Delta Sharing between metastores&lt;/STRONG&gt;&lt;/H3&gt;&lt;P class=""&gt;You could use &lt;SPAN class=""&gt;&lt;STRONG&gt;Delta Sharing&lt;/STRONG&gt;&lt;/SPAN&gt; to expose the source tables from the primary region and then recreate or hydrate them in the secondary region. Delta Sharing supports &lt;SPAN class=""&gt;&lt;STRONG&gt;cross-account and cross-region sharing&lt;/STRONG&gt;&lt;/SPAN&gt;, even across clouds.&lt;/P&gt;&lt;P class=""&gt;However, it’s worth noting that Delta Sharing is optimized for &lt;I&gt;data access and interoperability&lt;/I&gt;, not necessarily for &lt;I&gt;high-throughput replication&lt;/I&gt;, and performance can be a concern — especially for large or frequently changing tables.&lt;/P&gt;&lt;H3&gt;&lt;STRONG&gt;2. &lt;/STRONG&gt;&lt;STRONG&gt;File-level replication (e.g., AzCopy, Azure Data Factory)&lt;/STRONG&gt;&lt;/H3&gt;&lt;P class=""&gt;Another robust approach is to &lt;SPAN class=""&gt;&lt;STRONG&gt;replicate the underlying Delta Lake files&lt;/STRONG&gt;&lt;/SPAN&gt; using tools like &lt;SPAN class=""&gt;&lt;STRONG&gt;AzCopy&lt;/STRONG&gt;&lt;/SPAN&gt; or &lt;SPAN class=""&gt;&lt;STRONG&gt;Azure Data Factory&lt;/STRONG&gt;&lt;/SPAN&gt;, similar to what AWS DataSync provides.&lt;/P&gt;&lt;P class=""&gt;This method is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Cost-effective&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Cross-account and cross-region&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Storage-native (no Databricks compute required during transfer)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Once the data is in the target region’s storage account, you can register the tables manually (or via automation) in the secondary Unity Catalog metastore. This essentially gives you a snapshot of the latest state of your tables.&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;&lt;STRONG&gt;3. &lt;/STRONG&gt;&lt;STRONG&gt;Snapshots + Restore&lt;/STRONG&gt;&lt;/H3&gt;&lt;P class=""&gt;If you’re using &lt;SPAN class=""&gt;&lt;STRONG&gt;ADLS Gen2 with versioning or backup policies&lt;/STRONG&gt;&lt;/SPAN&gt;, you can take advantage of storage-level snapshots. In a DR event, you could restore those snapshots into a separate container or region and then rehydrate the tables in Databricks.&lt;/P&gt;&lt;P class=""&gt;This method is slower in terms of RTO but can serve as a last-resort recovery strategy.&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Hope this helps, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jun 2025 10:51:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-deep-clone/m-p/122457#M46779</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-06-22T10:51:22Z</dc:date>
    </item>
  </channel>
</rss>

