cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Unity Catalog and managed tables

Ashley1
Contributor

Hi All,

Can anyone point me to either documentation or personally tried and tested method of backing up (and restoring) Unity Catalog and its associated managed tables? We're running on Azure and using ADLS Gen2.

Regards,

Ashley

8 REPLIES 8

karthik_p
Esteemed Contributor

@Ashley Betts​ May i know need for backup and re-store, usually as a best practice table data is stored in external location or external tables not managed tables. if you store data as external table regular copy or backup mechanism should work. will wait if we can get more inputs from any of our community members please

Pat
Honored Contributor III

Hi @karthik p​ ,

I have to disagree.

Managed tables are the default way to create tables in Unity Catalog. These tables are stored in the Unity Catalog root storage location that you configured when you created a metastore. Databricks recommends using managed tables whenever possible to ensure support of Unity Catalog features. All managed tables use Delta Lake.

source: https://docs.databricks.com/data-governance/unity-catalog/best-practices.html#organize-your-data

@Ashley Betts​  if you see this:

Each metastore is configured with a root storage location, which is used for managed tables. You need to ensure that no users have direct access to this storage location. Giving access to the storage location could allow a user to bypass access controls in a Unity Catalog metastore and disrupt auditability. For these reasons, you should not reuse a bucket that is your current DBFS root file system or has previously been a DBFS root file system for the root storage location in your Unity Catalog metastore.

source: https://docs.databricks.com/data-governance/unity-catalog/best-practices.html#configure-a-unity-cata...

I don't have resources on the UC backup. If you read the above you can find out that Unity Catalog / metastore managed tables are stored in the metastore root bucket.

You should Create an IAM role that Databricks uses to give access to that storage bucket, so basically there shouldn't be other mechanisms to read/write data (outside the databricks) to make sure the data won't get corrupted, or someone will bypass the access control set in Unity Catalog. When you use Delta Tables you can use Time travel to restore the previous version of the tables.

Backup seems tricky as managed tables are no longer stored in locations corresponding to the names, but they have some sort of uuid and I think the mapping of the table name to the location is stored in the Databricks control plane (database/backend).

I have always liked external tables, but with the UC I am leaning more towards managed tables.

thanks,

Pat.

karthik_p
Esteemed Contributor

@Pat Sienkiewicz​ you are right, i moved bit a side i think (external table storage recommendation is without UC). for unity catalog managed tables in metastore which is root storage is recommended. thank you for above post @Ashley Betts​ above pat response will provide you more information.

@Pat Sienkiewicz​ but here i have one question in terms of backup, i do remembered for one of Databricks E2 migrations, we have moved managed table data which will be under /user/hive/warehouse, if that is possible UC metastore managed table migration also should be possible.

Pat
Honored Contributor III

@karthik p​  migration is a different story I would say.

just few ideas, to migrate data to different metastore / UC one could use delta sharing then transfer the data. You have many options here for example deep copy, insert into new table, etc.

other option is to setup external locations in your current workspace/uc end export data to external tables.

thanks,

Pat.

Ashley1
Contributor

Thanks fellas, the main driver for backup/restore is risk management. To have a procedure in place following complete failure or malicious acts. We have SCD2 tables in databricks which are the only source of historic data. While I realise I have redundancy at the storage layer to cover partial failures this doesn't cover total failure or malicious (or even accidental) acts.

prasadvaze
Valued Contributor II

@Ashley Betts​  Let me know if you found a way to backup/restore UC metastore. Its valuable feature because unlike EXTERNAL hive_metastore where I could go and see the meta data in the SQL server tables (my external hive metastore ran on SQL server ) , the UC metadata is not accessible to me (its stored in databricks control plane) . Here the metadata I am referring to is schema and table and column names and ADLS folder path where table-data is stored for external tables.

@Pat Sienkiewicz​  when creating a UC catalog or schema I provided an ADLS folder path (different from UC metastore ADLS account) and created external delta tables. ( because I want the storage cost go to the business team) But the upside using managed tables is auto-optimize and space clean up when table is dropped. Though Unity manages permission on SPN to the ADLS folder where table data lives , the permission is applied when use is trying to access the folder thru databricks workspace (the permission doesn't apply when a user runs py script using that same SPN against ADLS folder outside workspace , say azure function app)

prasad_vaze
New Contributor III

Our UC managed  tables are stored on  prod ADLS storage which is different from UC root storage account.  So what's the best way to backup and restore UC managed tables into different region?   One option is deep clone tables, copy ADLS folders to another region and then redefine the tables on them in a metastore in that region. But is there any other better way? 

Aria
New Contributor III

@prasad_vaze  Have you got any direction on this? I am in the same boat.Looking for an approach to backup and restore the Tables.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!