cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity catlog replication or Disaster recovery implementation

Vinay123
New Contributor III

I am working on Disaster recovery implementation on databricks on aws.

I am not able to find how to implement with unity catalog.

I am planning to create two workspaces in two different regions, one would be primary workspace which will be active and other would be secondary workspace which is passive.

I want to sync secondary workspace ​with the primary one, there are two ways given below

Databricks sync tool: there is no proper documentation how to use it.

CI/CD: which I am planning to follow which will simultaneously deploy in both workspaces.

I Think CI/CD approach makes control plane similiar in both workspaces but the problem is with data plane and especially unity catalog as there is no blogs or documentation to replicate unity catalog and attach to secondary workspace.

Please let me know your though on control plane replication which iam planning to follow.

And

please let me know how we can replicate unity catalog in different aws region

2 REPLIES 2

karthik_p
Esteemed Contributor

@Suram Vinay​ From my end i have not implemented this, but just cheked this blog previously. terraform script will help for DR setup. https://www.databricks.com/blog/2022/07/18/disaster-recovery-automation-and-tooling-for-a-databricks...

control plane is not in our control, it is under databricks control, databricks will take care of that. only data plane will be take care from our end. '

unity catalog replication and DR workspace are bot are different, DR for workspace will replicate everything except unity catalog metastore if i am not wrong, max it may replicate catalog metadata/data if managed related to particular workspace. where as UC metastore is tied to account level, that is where i have concern. we need to see if s3 level multi zone selection will help for replication of UC. what kind of data you are storing is that managed/external in databricks

Vinay123
New Contributor III

Iam storing the data in managed table.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.