cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity catlog replication or Disaster recovery implementation

Vinay123
New Contributor III

I am working on Disaster recovery implementation on databricks on aws.

I am not able to find how to implement with unity catalog.

I am planning to create two workspaces in two different regions, one would be primary workspace which will be active and other would be secondary workspace which is passive.

I want to sync secondary workspace ​with the primary one, there are two ways given below

Databricks sync tool: there is no proper documentation how to use it.

CI/CD: which I am planning to follow which will simultaneously deploy in both workspaces.

I Think CI/CD approach makes control plane similiar in both workspaces but the problem is with data plane and especially unity catalog as there is no blogs or documentation to replicate unity catalog and attach to secondary workspace.

Please let me know your though on control plane replication which iam planning to follow.

And

please let me know how we can replicate unity catalog in different aws region

2 REPLIES 2

karthik_p
Esteemed Contributor

@Suram Vinay​ From my end i have not implemented this, but just cheked this blog previously. terraform script will help for DR setup. https://www.databricks.com/blog/2022/07/18/disaster-recovery-automation-and-tooling-for-a-databricks...

control plane is not in our control, it is under databricks control, databricks will take care of that. only data plane will be take care from our end. '

unity catalog replication and DR workspace are bot are different, DR for workspace will replicate everything except unity catalog metastore if i am not wrong, max it may replicate catalog metadata/data if managed related to particular workspace. where as UC metastore is tied to account level, that is where i have concern. we need to see if s3 level multi zone selection will help for replication of UC. what kind of data you are storing is that managed/external in databricks

Vinay123
New Contributor III

Iam storing the data in managed table.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group