cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Migrate tables from one azure databricks workspace to another

wats0ns
New Contributor III

Hello all,

I'm currently trying to move the tables contained in one azure workspace to another, because of a change in the way we use our resources groups. I have not been able to move more than metadata with the databrickslabs/migrate repo. I was wondering if I could copy data from within a notebook but the JDBC connection keep saying to check the driver.

Is there any process to move data between workspaces ?

1 ACCEPTED SOLUTION

Accepted Solutions

Pat
Honored Contributor III

@Quentin Maireโ€‹ , If you cannot access data from outside you will have to migrate it from inside.

If the data is stored in the root container and is not accessible from outside (I think you should be able to make this data accessible with the Azure Policies, but I don't know how to do it right now) the option is to create separate location (storage account, container). CREATE EXTERNAL tables and migrate data there 1:1, for example, if you have

my_database.my_table_1, then in existing workspace you attach the new storage and migrate data:

CREATE EXTERNAL TABLE my_database_ext.my_table_1
(
col_1 INT,
col_2 STRING
)
LOCATION 'abfss://some-location/my_table_1';
 
INSERT INTO my_database_ext.my_table_1 
SELECT * FROM my_database.my_table_1;

then in your new workspace, either you use external location as is:

CREATE EXTERNAL TABLE my_database.my_table_1
(
col_1 INT,
col_2 STRING
)
LOCATION 'abfss://some-location/my_table_1';

or you add extra step and copy data from these external tables into managed ones.

Either way, it seems like there is a need to migrate the data.

Ideally if you could access the storage in the root container, create the external table on top of it. Then in new workspace

INSERT INTO table_1
SELECT * FROM ext_table_1;

View solution in original post

7 REPLIES 7

Pat
Honored Contributor III

Hi @Quentin Maireโ€‹ ,

We need a bit more details.

where is your data stored ?

are you using external or managed tables?

the migrate tool allows you to export DDL statements not the data itself.

I can think about few scenarios on Top of my head.

if you had previously external tables you can create tables in the new workspace using the same adls path, it will allow you to access data.

if you used external tables but you need new location for them (storage account, etc). You cN copy data with azure native tools like az copy to new location. Then create external tables using new location.

I think the above one should work for managed tabela as well, you just need to find out where the data is stored - in the dbfs root location.

you can then copy data into new location and create external tables.

mare you planning to use unity catalog ? You can then create storage credential/external location using exisitng table location and copy data into managed or external tables.

there is few options here, bit if you used the managed tables then most-likely there is a need for copy in your case - if the data storage needs to change as well. The copy can be done with the az copy or some of the databricks options.

Please see here how to ingest data:

https://docs.databricks.com/ingestion/index.html

thanks,

Pat

wats0ns
New Contributor III

Hey Pat, thanks a lot for your quick answer. I'm indeed using the managed tables, and would like to move the whole DB across the workspaces. There is a storage account related to the workspace, but I can see several containers in it: ephemeral, jobs, logs, meta and root. Is it safe to copy them from one storage account to another, overwriting the destination's ones ? Which of these containers contains the hive data ?

Thanks a lot,

Quentin

Pat
Honored Contributor III

Hi @Quentin Maireโ€‹ ,

I didn't work much with the managed tables until now, but now with the Unity Catalog is little bit different.

I do think that data is stored in the `root` container, you should be able to see like hive table structure there:

root@something:

table_1/part=1

/part=2

I would have to dig deeper, but I would test the data copy on 1 table and see.

other: logs, etc. are workspace related. I don't think this is needed when migrating. 

thanks,

Pat.

wats0ns
New Contributor III

Hello Kaniz, there seems to be a databricks System Deny Assignment on the IAM of the Azure storage account for the original workspace. How can I get the wasb path of the existing tables of the existing workspace, without having access to the blob storage, and then get access to them ?

Pat
Honored Contributor III

Hi @Quentin Maireโ€‹ ,

you can try this statement:

SHOW CREATE DATABASE `database-name`;

It should give you `location` it's default location for managed tables.

It might be that your data is not stored in the `root` container. I forgot this part.

Basically when you create database you can specify location, it will be used for the managed tables created in that database.

wats0ns
New Contributor III

SHOW CREATE DATABASE raise a ParseException, but with DESCRIBE DATABASE, I was able to find that the DB location is dbfs:/user/hive/warehouse. I'm not sure that the proposed SO thread above works then, as the data is not accessible from outside. Is there another solution ?

Pat
Honored Contributor III

@Quentin Maireโ€‹ , If you cannot access data from outside you will have to migrate it from inside.

If the data is stored in the root container and is not accessible from outside (I think you should be able to make this data accessible with the Azure Policies, but I don't know how to do it right now) the option is to create separate location (storage account, container). CREATE EXTERNAL tables and migrate data there 1:1, for example, if you have

my_database.my_table_1, then in existing workspace you attach the new storage and migrate data:

CREATE EXTERNAL TABLE my_database_ext.my_table_1
(
col_1 INT,
col_2 STRING
)
LOCATION 'abfss://some-location/my_table_1';
 
INSERT INTO my_database_ext.my_table_1 
SELECT * FROM my_database.my_table_1;

then in your new workspace, either you use external location as is:

CREATE EXTERNAL TABLE my_database.my_table_1
(
col_1 INT,
col_2 STRING
)
LOCATION 'abfss://some-location/my_table_1';

or you add extra step and copy data from these external tables into managed ones.

Either way, it seems like there is a need to migrate the data.

Ideally if you could access the storage in the root container, create the external table on top of it. Then in new workspace

INSERT INTO table_1
SELECT * FROM ext_table_1;

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group