cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks interoperability between cloud environments

Murugan
New Contributor II

While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,

1) Whether Databricks can be cloud agnostic (i.e.,) In case we develop notebooks for data engineering pipelines with Azure databricks - Can the same be exported and used in other cloud platforms in case the client wants to switch the cloud platform at a later point in time?

2) How about the delta lake / tables support in these environments in the above scenario - Say for example: if the root folder containing the delta table data is copied to a different cloud storage (ex: ADLS to S3) and a delta table is created on top of this data - will it work seamlessly?

3) In another case of a multi cloud environment - what are the possibilities of using the same notebooks /code in the databricks workspace across the cloud environments

Are there are any documentation available for the above pointers - will be helpful to get a reference for the same if available.

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

I have few thoughts:

  • You can use the same runtime version so spark version for sure will be the same in all clouds,
  • Problem can be sometimes with additional new Databricks features like SQL alarms etc. as they are introduced in different times depending on platform (usually it is not such a problem as something is new usually in the beginning you use it only for development/training purposes),
  • Regarding storage you still can mount S3 on Azure and Azure Data Lake in AWS just connections will be not through dedicated private endpoints and traffic-out charges will apply,
  • Regarding code "Repo" will make your live easier as all notebooks etc. can be hosted on git (github, code commit, Azure devops or somewhere else)
  • The biggest but also solvable issue I see in code to automatically deploy pools and clusters through CLI as server spec differ between platform (VM name, hard drive name) so you could make script for that with some mapping (what VM in Azure = what VM in AWS)

-werners-
Esteemed Contributor III

Databricks wants to avoid vendor lock-in, so in theory it is cloud platform agnostic.

However, this does not just work. You have to think about all the configuration you did on your databricks workspace and do the same configuration on the other cloud platform, well not literally the same but conceptually the same (f.e. ADLS vs S3, firewalls, git, jars, ...).

The code itself will work. I have no knowledge of certain possibilities not being available on a cloud provider, preview functionalities not included!

The cool part is that DBFS, on which databricks works is a semantic layer over your physical storage.

So as long as your DBFS paths are the same over the providers you will be ok.

But any config should be taken into account.

Hubert-Dudek
Esteemed Contributor III

" certain possibilities not being available on a cloud provider" roadmaps are different for every platform. I think today situation is that all platforms are the same but probably it can be a bit different after Christmas 🙂

Anonymous
Not applicable

You'll be interested in the Unity Catalog.

The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the different accounts as werners mentioned above.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group