cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity catlog replication or Disaster recovery implementation

Vinay123
New Contributor III

I am working on Disaster recovery implementation on databricks on aws.

I am not able to find how to implement with unity catalog.

I am planning to create two workspaces in two different regions, one would be primary workspace which will be active and other would be secondary workspace which is passive.

I want to sync secondary workspace ​with the primary one, there are two ways given below

Databricks sync tool: there is no proper documentation how to use it.

CI/CD: which I am planning to follow which will simultaneously deploy in both workspaces.

I Think CI/CD approach makes control plane similiar in both workspaces but the problem is with data plane and especially unity catalog as there is no blogs or documentation to replicate unity catalog and attach to secondary workspace.

Please let me know your though on control plane replication which iam planning to follow.

And

please let me know how we can replicate unity catalog in different aws region

2 REPLIES 2

karthik_p
Esteemed Contributor

@Suram Vinay​ From my end i have not implemented this, but just cheked this blog previously. terraform script will help for DR setup. https://www.databricks.com/blog/2022/07/18/disaster-recovery-automation-and-tooling-for-a-databricks...

control plane is not in our control, it is under databricks control, databricks will take care of that. only data plane will be take care from our end. '

unity catalog replication and DR workspace are bot are different, DR for workspace will replicate everything except unity catalog metastore if i am not wrong, max it may replicate catalog metadata/data if managed related to particular workspace. where as UC metastore is tied to account level, that is where i have concern. we need to see if s3 level multi zone selection will help for replication of UC. what kind of data you are storing is that managed/external in databricks

Vinay123
New Contributor III

Iam storing the data in managed table.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!