<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How Deep clone works in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155236#M54217</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;For DR purposes, we have setup Deep clone using delta share. Each time the deep clone job runs, it executes the query&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;create&amp;nbsp;or&amp;nbsp;replace&amp;nbsp;table&amp;nbsp;{schema}.{table}&amp;nbsp;deep&amp;nbsp;clone&amp;nbsp;{delta_share}.{schema}.{table}&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The 1st time the job ran, it took few hours to complete, but it has been completing in 15 mins for the subsequent runs.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;From my understanding, deep clone replaces the entire table each time, so why did the job take few hours to deep clone for the 1st run and lesser time after that?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Can someone please help understand how deep clone works with delta share?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 22 Apr 2026 19:48:53 GMT</pubDate>
    <dc:creator>DineshOjha</dc:creator>
    <dc:date>2026-04-22T19:48:53Z</dc:date>
    <item>
      <title>How Deep clone works</title>
      <link>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155236#M54217</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;For DR purposes, we have setup Deep clone using delta share. Each time the deep clone job runs, it executes the query&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;create&amp;nbsp;or&amp;nbsp;replace&amp;nbsp;table&amp;nbsp;{schema}.{table}&amp;nbsp;deep&amp;nbsp;clone&amp;nbsp;{delta_share}.{schema}.{table}&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The 1st time the job ran, it took few hours to complete, but it has been completing in 15 mins for the subsequent runs.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;From my understanding, deep clone replaces the entire table each time, so why did the job take few hours to deep clone for the 1st run and lesser time after that?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Can someone please help understand how deep clone works with delta share?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 22 Apr 2026 19:48:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155236#M54217</guid>
      <dc:creator>DineshOjha</dc:creator>
      <dc:date>2026-04-22T19:48:53Z</dc:date>
    </item>
    <item>
      <title>Re: How Deep clone works</title>
      <link>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155247#M54219</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Deep Clone is incremental.&amp;nbsp;&lt;SPAN&gt;This means that any consecutive DEEP CLONE will result in copying only new data files.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;Despite the&amp;nbsp;CREATE OR REPLACE syntax looking like a full overwrite, Delta Lake's DEEP CLONE tracks the &lt;STRONG&gt;Delta log (transaction history)&lt;/STRONG&gt; of the source table, not just the data files. Specifically, it records the &lt;STRONG&gt;last cloned version&lt;/STRONG&gt; of the source table in the clone's own Delta log.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;1st run (full copy):&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;No previous clone metadata exists&lt;/LI&gt;&lt;LI&gt;Databricks must copy &lt;STRONG&gt;all Parquet data files&lt;/STRONG&gt; from the Delta Share source to the target location&lt;/LI&gt;&lt;LI&gt;Also copies the full Delta transaction log&lt;/LI&gt;&lt;LI&gt;Time is proportional to total table size -&amp;gt; hence hours&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;STRONG&gt;Subsequent runs:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;Databricks reads the clone's Delta log to determine the &lt;STRONG&gt;last successfully cloned version&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;It then asks the Delta Share source: &lt;EM&gt;"What changed since version X?"&lt;/EM&gt;&lt;/LI&gt;&lt;LI&gt;Only &lt;STRONG&gt;new or modified files&lt;/STRONG&gt; (added/updated/deleted since that version) are physically copied&lt;/LI&gt;&lt;LI&gt;Unchanged files are referenced by the new snapshot without being re-copied&lt;/LI&gt;&lt;LI&gt;Time is proportional to the &lt;STRONG&gt;delta (change volume)&lt;/STRONG&gt; since last run -&amp;gt; hence ~15 mins&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;You can check following article for details:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://pl.seequality.net/power-clone-functionality-databricks-delta-tables/" rel="noopener" target="_blank"&gt;https://pl.seequality.net/power-clone-functionality-databricks-delta-tables/&lt;/A&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;If my answer was helpful, please consider marking it as the accepted solution.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 21:25:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155247#M54219</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-04-22T21:25:20Z</dc:date>
    </item>
    <item>
      <title>Re: How Deep clone works</title>
      <link>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155248#M54220</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/147238"&gt;@DineshOjha&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Deep clone is incremental, not a full re-copy every time, even when you use CREATE OR REPLACE TABLE … DEEP CLONE … against a Delta Sharing table.&lt;/P&gt;
&lt;P&gt;On the first DEEP CLONE, Databricks must read the entire source table (via Delta Sharing)...&amp;nbsp;Copy all data files + metadata into a brand-new Delta table at the target location. This is effectively a full physical copy, so the runtime is proportional to the full table size (and any cross-region / cross-cloud egress).&lt;/P&gt;
&lt;P&gt;On subsequent runs of&amp;nbsp;CREATE OR REPLACE TABLE target DEEP CLONE source, the target is already a deep clone, with history that records which source version was last cloned.&amp;nbsp;DEEP CLONE compares the current source version to the version recorded in the target’s history, then copies only new/changed data files from the source, and also updates the target’s Delta log with a new commit that references the old files and&amp;nbsp;any newly copied ones. The commit is incremental, not a full rewrite of all files.&amp;nbsp;So your later runs only move the delta since the last clone, which is why they complete much faster.&lt;/P&gt;
&lt;P&gt;This behaviour is &lt;A href="https://docs.databricks.com/aws/en/delta-sharing/manage-egress#use-delta-deep-clone-for-incremental-replication" target="_blank"&gt;documented&lt;/A&gt; as shown in the snapshot, and the recommended pattern is exactly what you’re using.&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ashwin_DSA_0-1776892917680.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/26317iCDCF120AAE0C4B66/image-size/large?v=v2&amp;amp;px=999" role="button" title="Ashwin_DSA_0-1776892917680.png" alt="Ashwin_DSA_0-1776892917680.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;To the second part of your question about how it works with delta sharing... from the recipient workspace’s point of view, once the share is mounted in a catalog, the shared table is just another Delta table that happens to read its files via the Delta Sharing protocol.&lt;/P&gt;
&lt;P&gt;DEEP CLONE shared_table -&amp;gt; local_table&amp;nbsp;uses Delta Sharing (signed URLs or cloud tokens) to read the source table’s data files and copies those files into the target’s storage and creates a fully independent Delta table (the DR copy).&amp;nbsp;On subsequent DEEP CLONE runs to the same target, the same incremental logic applies... only new/changed files since the last cloned source version are copied, so the job time tracks the size of the changes, not the whole table.&lt;/P&gt;
&lt;P&gt;Whilst researching the information for you, I found this &lt;A href="https://www.databricks.com/blog/2021/04/20/attack-of-the-delta-clones-against-disaster-recovery-availability-complexity.html" target="_blank"&gt;blog&lt;/A&gt; which I think is still relevant. It may not cover the recent improvements as it is from 2021 but the visuals can help you understand the workings.&lt;/P&gt;
&lt;P&gt;Hope that helps.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;EM&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/EM&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 21:27:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-deep-clone-works/m-p/155248#M54220</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-22T21:27:36Z</dc:date>
    </item>
  </channel>
</rss>

