cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Copying Delta table from QA to Prod

MachuPichu
New Contributor III

I am looking for a way to copy large managed Delta table like 4TB from one environment (QA) to other(Prod) . QA and Prod are in different subscription and in different region. I understand Databricks provides a way to clone table. But I am not sure if cloning can work across the subscriptions. Yes, there is network connectivity between QA and prod in case files need to be copied from lower to higher environment. I am sure I am not the first person trying to copy tables across the environment. Can you share how you performed such copy/migration ?

9 REPLIES 9

sher
Valued Contributor II

USE DEEP CLONE

CREATE TABLE delta.`/data/target/` CLONE delta.`/data/source/` -- Create a deep clone of /data/source at /data/target

ref link: https://docs.databricks.com/optimizations/clone.html

MachuPichu
New Contributor III

Does it support cloning across the subscription ? If so can you share an example?

Cami
Contributor III

I don't know if it would be a ideal option, but please read more Unity Catalog and delta sharing. DEEP CLONE souds good.

MachuPichu
New Contributor III

We are not using unity catalog. This is still based on Hive catalog

daniel_sahal
Honored Contributor III

@Ratnadeep Bose​ 

The best way would be to create a storage that will be used to copy the data between two envs.

  1. Create an external table on Sub1 with the same schema as source table, with location on the storage,
  2. Insert into external table from source table,
  3. Create an external table on Sub2 with the same schema as source table, with location on the storage,

Thanks to that you've got the same data on both subscriptions.

Just to be clear we are using managed delta table not external table. I am not sure if above solution will still work. Thanks very much for your feedback

daniel_sahal
Honored Contributor III

@Ratnadeep Bose​ 

That's why I've mentioned creating external table as a table that will be used for data copy between two environments. It should be a copy of source table but with the location on the storage.

Hubert-Dudek
Esteemed Contributor III

I would use a data factory to copy 4TB files as it has gigantic throughput. After completing a copy of everything, I would register as a table in the new metastore.

Thought about using ADF. Since we are using managed Delta table, I am not sure how you can register based on external data. Any idea?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.