12-02-2021 12:51 AM
While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,
1) Whether Databricks can be cloud agnostic (i.e.,) In case we develop notebooks for data engineering pipelines with Azure databricks - Can the same be exported and used in other cloud platforms in case the client wants to switch the cloud platform at a later point in time?
2) How about the delta lake / tables support in these environments in the above scenario - Say for example: if the root folder containing the delta table data is copied to a different cloud storage (ex: ADLS to S3) and a delta table is created on top of this data - will it work seamlessly?
3) In another case of a multi cloud environment - what are the possibilities of using the same notebooks /code in the databricks workspace across the cloud environments
Are there are any documentation available for the above pointers - will be helpful to get a reference for the same if available.
12-02-2021 02:32 AM
I have few thoughts:
12-02-2021 07:39 AM
Databricks wants to avoid vendor lock-in, so in theory it is cloud platform agnostic.
However, this does not just work. You have to think about all the configuration you did on your databricks workspace and do the same configuration on the other cloud platform, well not literally the same but conceptually the same (f.e. ADLS vs S3, firewalls, git, jars, ...).
The code itself will work. I have no knowledge of certain possibilities not being available on a cloud provider, preview functionalities not included!
The cool part is that DBFS, on which databricks works is a semantic layer over your physical storage.
So as long as your DBFS paths are the same over the providers you will be ok.
But any config should be taken into account.
12-02-2021 11:37 AM
" certain possibilities not being available on a cloud provider" roadmaps are different for every platform. I think today situation is that all platforms are the same but probably it can be a bit different after Christmas 🙂
12-03-2021 04:49 AM
You'll be interested in the Unity Catalog.
The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the different accounts as werners mentioned above.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.