Hubert-Dudek
Databricks MVP

I have few thoughts:

  • You can use the same runtime version so spark version for sure will be the same in all clouds,
  • Problem can be sometimes with additional new Databricks features like SQL alarms etc. as they are introduced in different times depending on platform (usually it is not such a problem as something is new usually in the beginning you use it only for development/training purposes),
  • Regarding storage you still can mount S3 on Azure and Azure Data Lake in AWS just connections will be not through dedicated private endpoints and traffic-out charges will apply,
  • Regarding code "Repo" will make your live easier as all notebooks etc. can be hosted on git (github, code commit, Azure devops or somewhere else)
  • The biggest but also solvable issue I see in code to automatically deploy pools and clusters through CLI as server spec differ between platform (VM name, hard drive name) so you could make script for that with some mapping (what VM in Azure = what VM in AWS)

My blog: https://databrickster.medium.com/