cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Copying Delta table from QA to Prod

MachuPichu
New Contributor III

I am looking for a way to copy large managed Delta table like 4TB from one environment (QA) to other(Prod) . QA and Prod are in different subscription and in different region. I understand Databricks provides a way to clone table. But I am not sure if cloning can work across the subscriptions. Yes, there is network connectivity between QA and prod in case files need to be copied from lower to higher environment. I am sure I am not the first person trying to copy tables across the environment. Can you share how you performed such copy/migration ?

9 REPLIES 9

sher
Valued Contributor II

USE DEEP CLONE

CREATE TABLE delta.`/data/target/` CLONE delta.`/data/source/` -- Create a deep clone of /data/source at /data/target

ref link: https://docs.databricks.com/optimizations/clone.html

MachuPichu
New Contributor III

Does it support cloning across the subscription ? If so can you share an example?

Cami
Contributor III

I don't know if it would be a ideal option, but please read more Unity Catalog and delta sharing. DEEP CLONE souds good.

MachuPichu
New Contributor III

We are not using unity catalog. This is still based on Hive catalog

daniel_sahal
Esteemed Contributor

@Ratnadeep Bose​ 

The best way would be to create a storage that will be used to copy the data between two envs.

  1. Create an external table on Sub1 with the same schema as source table, with location on the storage,
  2. Insert into external table from source table,
  3. Create an external table on Sub2 with the same schema as source table, with location on the storage,

Thanks to that you've got the same data on both subscriptions.

Just to be clear we are using managed delta table not external table. I am not sure if above solution will still work. Thanks very much for your feedback

@Ratnadeep Bose​ 

That's why I've mentioned creating external table as a table that will be used for data copy between two environments. It should be a copy of source table but with the location on the storage.

Hubert-Dudek
Esteemed Contributor III

I would use a data factory to copy 4TB files as it has gigantic throughput. After completing a copy of everything, I would register as a table in the new metastore.

Thought about using ADF. Since we are using managed Delta table, I am not sure how you can register based on external data. Any idea?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group