โ07-13-2022 11:45 PM
Which cloud hosting environment is best to use for Databricks? My question pins down to the fact that there must be some difference between the latency, throughput, result consistency & reproducibility between different cloud hosting environments of Databricks. Hence, how can I decide which one is best to use? What are the minor difficulties with the other etc.?
โ07-14-2022 05:06 AM
@Vikas Sinhaโ โ Databricks works the same in all the cloud platforms that are supported. Choosing the cloud vendor depends on your business requirement. To know more about how Databricks works on these cloud platform you can refer to the product pages.
โ09-03-2022 11:59 PM
Hi @Vikas Sinhaโ
Does @Prabakar Ammeappinโ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?
We'd love to hear from you.
Thanks!
2 weeks ago - last edited 2 weeks ago
The main Databricks experience is essentially the same on both Azure and GCP. The difference is in the cloud infrastructure that supports them.
Azure Databricks is a bit more integrated with Azure services like Azure Data Lake, Synapse Analytics, and the Microsoft ecosystem in general if you are already using them. This can be a great advantage... if you know...
Besdes that, it also makes security better for business customers in some aspects. If you use GCP's BigQuery or other Google-native services, GCP Databricks, on the contrary, might be the one that fits your needs better.
It could be just a minor thing that you may notice that GCPโs networking and latency have the potential to be quicker for certain workloads, depending on the location of your clusters.
But at the same time, it is very specific to a particular workload.
Therefore, if you are concerned about performance, then you should consider where your data is stored and the entire cloud infrastructure so that you can be sure that you are using Databricks with the best cloud for your needs.
Just so you know, latency changes are typicaly small unless you have many heavy, real-time streaming workloads.
There are people who say that GCP is more convenient to scale, particularly when it comes to ML pipelines. However, this is dependent on the use case.
So, don't just think about Databricks; think about the whole stack.
More information about Microsoft Azure can be found here
a week ago
Both Azure Databricks and GCP Databricks offer powerful capabilities, but Azure Databricks is generally preferred for tighter enterprise integration, while GCP Databricks excels in flexibility and cost-efficiency. The best choice depends on your organization's cloud strategy, existing infrastructure, and specific use cases.
Your organization already uses Azure services extensively.
You need enterprise-grade security, compliance, and governance.
You want tight integration with tools like Power BI, Azure Synapse, or Azure ML.
Your team prefers open cloud architecture and flexibility.
Youโre focused on cost optimization and scalable AI/ML workloads.
You use GCP-native tools like BigQuery or Vertex AI.
a week ago
@VikasSinha Databricks is not stable regardless of the cloud, jobs and clusters keep crashing. Use Polars or Duckdb instead.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now