01-21-2023 11:36 AM
I am looking for a way to copy large managed Delta table like 4TB from one environment (QA) to other(Prod) . QA and Prod are in different subscription and in different region. I understand Databricks provides a way to clone table. But I am not sure if cloning can work across the subscriptions. Yes, there is network connectivity between QA and prod in case files need to be copied from lower to higher environment. I am sure I am not the first person trying to copy tables across the environment. Can you share how you performed such copy/migration ?
01-22-2023 04:37 AM
USE DEEP CLONE
CREATE TABLE delta.`/data/target/` CLONE delta.`/data/source/` -- Create a deep clone of /data/source at /data/target
ref link: https://docs.databricks.com/optimizations/clone.html
01-22-2023 07:41 PM
Does it support cloning across the subscription ? If so can you share an example?
01-22-2023 07:52 AM
I don't know if it would be a ideal option, but please read more Unity Catalog and delta sharing. DEEP CLONE souds good.
01-22-2023 07:43 PM
We are not using unity catalog. This is still based on Hive catalog
01-23-2023 01:34 AM
@Ratnadeep Bose
The best way would be to create a storage that will be used to copy the data between two envs.
Thanks to that you've got the same data on both subscriptions.
01-24-2023 12:29 PM
Just to be clear we are using managed delta table not external table. I am not sure if above solution will still work. Thanks very much for your feedback
01-24-2023 10:32 PM
@Ratnadeep Bose
That's why I've mentioned creating external table as a table that will be used for data copy between two environments. It should be a copy of source table but with the location on the storage.
01-23-2023 05:24 AM
I would use a data factory to copy 4TB files as it has gigantic throughput. After completing a copy of everything, I would register as a table in the new metastore.
01-24-2023 12:33 PM
Thought about using ADF. Since we are using managed Delta table, I am not sure how you can register based on external data. Any idea?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group