Databricks Community

wilson-mok · ‎11-24-2022

Hi Liliana,

During the user group discussion, there was a mentioned regarding multi-cloud implementation with Databricks. If a workload fails in 1 cloud (say in Azure), then it will run on another cloud vendor (say in AWS).

I imagine the storages across the clouds will have to be synced (maybe with Delta deep clone?) and how does it work if managed tables are used?

If you have any reference articles or documentation I can review? I would like to gain more insight on how this is designed/implemented.

liliana_tang · ‎12-01-2022

Hi Wilson! Great question, and yes, you will be able to achieve that through Deep Clone: https://www.databricks.com/blog/2021/04/20/attack-of-the-delta-clones-against-disaster-recovery-avai... Please give this a read because it outlines in detail the whole process!

Additionally, this blog post does a great job describing how we built a lakehouse across multi-cloud in a high level: https://www.databricks.com/blog/2021/07/14/petabyte-scale-data-processing-across-multiple-cloud-plat...

View solution in original post

liliana_tang · ‎12-01-2022

Hi Wilson! Great question, and yes, you will be able to achieve that through Deep Clone: https://www.databricks.com/blog/2021/04/20/attack-of-the-delta-clones-against-disaster-recovery-avai... Please give this a read because it outlines in detail the whole process!

Additionally, this blog post does a great job describing how we built a lakehouse across multi-cloud in a high level: https://www.databricks.com/blog/2021/07/14/petabyte-scale-data-processing-across-multiple-cloud-plat...

wilson-mok · ‎12-12-2022

Thank you for the links!

stefnhuy · ‎09-11-2023

Hey there, Wilson-Mok!

Multi-cloud implementation with Databricks is a captivating endeavor, is not it? To achieve this, you're on the right track thinking about data synchronization.

One method to harmonize data between clouds is to employ a data replication tool, perhaps leveraging a combination of Delta Lake and external tools like Apache Airflow or dbt (data build tool). Delta Lake's deep clone can indeed be useful for keeping your data in sync. You can periodically replicate data from one cloud's storage to another using this approach.

However, when it comes to managed tables, things can get a bit tricky. You'll need to ensure that the metastore where your managed tables' metadata is stored is accessible from both clouds. Additionally, consider using a mechanism like the Hive metastore for a unified metadata repository. You can also find some smart thoughts here: Cloud Data Migration Challenges: Explore 6 Best Strategies in 2023.

As for reference material, the professional Databricks documentation often includes valuable insights and exceptional practices for multi-cloud setups. You may additionally discover community forums and blogs to analyze from real-world experiences.

Databricks Community

Multi-cloud implementation

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST