cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to copy files of databricks associated storage account UC tables along with _delta_log folder

ajay_wavicle
Databricks Partner

I want to migrate managed tables from one cloud Databricks workspace to another as it is with delta history. I am able to do with External tables since i have access to storage account container folder but its not the case for UC managed tables. How do i solve this to copy files from one storage account to another

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @ajay_wavicle ,

I think you can try with DEEP CLONE. deep clone is a clone that copies the source table data to the clone target in addition to the metadata of the existing table

Clone a table on Databricks | Databricks on AWS

lucami
Contributor

Hi @szymon_dybczak,

I suggest you the following:

  1. Create storage credential
  2. Register an external location for new storage location
  3. Create the catalog with a managed location
  4. Migrate table with full Delta history using DEEP CLONE


-- Azure example
CREATE EXTERNAL LOCATION extloc_target
  URL 'abfss://<container>@<account>.dfs.core.windows.net/<prefix>'
  WITH STORAGE CREDENTIAL cred_mi
  COMMENT 'Target path for managed catalog';


CREATE CATALOG my_catalog
  MANAGED LOCATION 'abfss://<container>@<account>.dfs.core.windows.net/<prefix>';

 

CREATE TABLE my_catalog.core.my_table
DEEP CLONE src_catalog.core.my_table;

ajay_wavicle
Databricks Partner

@lucami @szymon_dybczak I tried deep clone as mentioned above but when i describe history on the new cloned table i get only 1 row in desc history. How do i Make it same as before.

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @ajay_wavicle,

Migrating UC managed tables between workspaces while preserving delta history requires a different approach than external tables, because Unity Catalog controls the underlying storage for managed tables and you do not have direct access to the files. Here are your options, depending on your exact requirements:

OPTION 1: DEEP CLONE (recommended for most migration scenarios)

If both workspaces share the same Unity Catalog metastore (i.e., they are in the same cloud region), you can run a DEEP CLONE directly from the source table to a new managed table in the destination catalog/schema:

CREATE TABLE destination_catalog.schema.my_table
DEEP CLONE source_catalog.schema.my_table;

This copies all data files and table metadata (schema, partitioning, invariants, nullability) into a new independent managed table. One important caveat: DEEP CLONE does not preserve the source table's transaction log history. The cloned table starts with a fresh version history. If you need time travel against the original versions, you would need to keep the source table available.

Documentation: https://docs.databricks.com/en/delta/clone.html

OPTION 2: SAME METASTORE, CROSS-WORKSPACE ACCESS

If both workspaces are attached to the same Unity Catalog metastore, you do not need to copy files at all. Any managed table registered in Unity Catalog is automatically accessible from any workspace connected to that metastore. Simply grant the appropriate permissions to the users/groups in the destination workspace and they can query the table directly.

OPTION 3: CROSS-REGION OR CROSS-CLOUD MIGRATION (Delta Sharing)

If the workspaces are in different regions or on different cloud providers, use Databricks-to-Databricks Delta Sharing:

1. In the source workspace, create a share and add the tables you want to migrate.
2. In the destination workspace, create a recipient and catalog from the share.
3. Once shared, you can DEEP CLONE from the shared catalog into a local managed table in the destination workspace.

Note that Delta Sharing itself provides read-only access, so the DEEP CLONE step creates a local independent copy. Also be aware of egress charges for cross-region transfers.

Documentation: https://docs.databricks.com/en/data-sharing/index.html

OPTION 4: EXPORT AND RE-IMPORT VIA EXTERNAL STORAGE

If you need the raw files (including the _delta_log folder) for a non-Databricks migration or as a backup, you can:

1. DEEP CLONE the managed table to an external location you control:

CREATE TABLE delta.`abfss://container@account.dfs.core.windows.net/path/my_table`
DEEP CLONE source_catalog.schema.my_table;

2. Then copy those files (data files + _delta_log) from that external location to the destination storage using AzCopy, azcli, or any blob-level copy tool.

3. In the destination workspace, register the table:

CREATE TABLE destination_catalog.schema.my_table
USING DELTA
LOCATION 'abfss://container@account.dfs.core.windows.net/destination_path/my_table';

This gives you the full file-level copy with the _delta_log folder intact.

IMPORTANT NOTE ON HISTORY

None of these methods carry over the full original transaction log history from the source managed table. DEEP CLONE copies data as of the latest version and creates a new independent history. If retaining the complete version history is critical, you would need to maintain the original table or use external table locations where you manage the storage directly.

WHICH OPTION TO CHOOSE

- Same metastore, same region: Option 2 (no copy needed) or Option 1 (if you want an independent copy)
- Different metastore or region: Option 3 (Delta Sharing + DEEP CLONE)
- Need raw file-level access: Option 4 (clone to external location, then copy files)

Let me know if you have questions about which scenario fits your setup.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.