Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-22-2026 02:25 PM
Hi ,
Deep Clone is incremental. This means that any consecutive DEEP CLONE will result in copying only new data files.
Despite the CREATE OR REPLACE syntax looking like a full overwrite, Delta Lake's DEEP CLONE tracks the Delta log (transaction history) of the source table, not just the data files. Specifically, it records the last cloned version of the source table in the clone's own Delta log.
1st run (full copy):
- No previous clone metadata exists
- Databricks must copy all Parquet data files from the Delta Share source to the target location
- Also copies the full Delta transaction log
- Time is proportional to total table size -> hence hours
Subsequent runs:
- Databricks reads the clone's Delta log to determine the last successfully cloned version
- It then asks the Delta Share source: "What changed since version X?"
- Only new or modified files (added/updated/deleted since that version) are physically copied
- Unchanged files are referenced by the new snapshot without being re-copied
- Time is proportional to the delta (change volume) since last run -> hence ~15 mins
You can check following article for details:
https://pl.seequality.net/power-clone-functionality-databricks-delta-tables/
If my answer was helpful, please consider marking it as the accepted solution.