โ05-02-2024 05:29 AM
We are using a compute for an Interactive Cluster in Production which incurs X amount of cost. We want to know what are the options available to use with near about the same processing power as the current compute but incur a cost of Y, which is lesser than X.
โ05-03-2024 12:39 PM
This is exactly how @raphaelblg mentioned.
You have to dig into the MS docs about VM size.
You cant look just on "hey it is less memory, why is more expensive" ? It is not jus that.
In your example where you compare Dv2 to Dv3 Series you can find in docs that MS changed Memory to CPU ratio so it will be more efficient and also it runs in hyper-threaded configuration. They also adjusted disk and network limits to align with the move to hyperthreading.
Hyperthreading = improve parallelization of computations.
In DS5 V2 you have much higher IOPS and network bandwidth.
It is advised that you move to Ev3 and Esv3-series if you look for Memory optimized machines.
Please be aware of azure regions, there might be situation when machine "X" is more expensive then machine "Y" but in other region it might not be the same ๐
So if you swap you compute, you might see drop in performance.
If you are looking for some savings you need to:
- test different VMs
- check spot instances
- run Job clusters instead (maybe with pool for faster start-up)
I hope that I was able to clarify few things for you ๐
โ05-02-2024 04:22 PM
Hello @Ikanip ,
You can utilize the Databricks Pricing Calculator to estimate costs.
For detailed information on compute capacity, please refer to your cloud provider's documentation regarding Virtual Machine instance types.
โ05-02-2024 11:15 PM
Hi Raphael,
Thanks for this.
I will give you a specific example.
Lets say for Azure I choose DS5 V2 which has 16 cores and 56G of RAM costs $1,861.500/month as PayG. and then I choose D16s v3 which also has 16 cores and 64G of RAM costs $1,489.20/month as PayG which is lesser than the former. They are probably using the same 3rd Generation Intelยฎ Xeonยฎ Processors. But what is the difference and why is the difference in costing?
โ05-03-2024 08:03 AM - edited โ05-03-2024 08:25 AM
Hi @Ikanip I suggest checking this with the cloud provider, unfortunately I don't have the details. Databricks cost estimation relies to some extent on cloud provider cost estimation.
โ05-03-2024 12:39 PM
This is exactly how @raphaelblg mentioned.
You have to dig into the MS docs about VM size.
You cant look just on "hey it is less memory, why is more expensive" ? It is not jus that.
In your example where you compare Dv2 to Dv3 Series you can find in docs that MS changed Memory to CPU ratio so it will be more efficient and also it runs in hyper-threaded configuration. They also adjusted disk and network limits to align with the move to hyperthreading.
Hyperthreading = improve parallelization of computations.
In DS5 V2 you have much higher IOPS and network bandwidth.
It is advised that you move to Ev3 and Esv3-series if you look for Memory optimized machines.
Please be aware of azure regions, there might be situation when machine "X" is more expensive then machine "Y" but in other region it might not be the same ๐
So if you swap you compute, you might see drop in performance.
If you are looking for some savings you need to:
- test different VMs
- check spot instances
- run Job clusters instead (maybe with pool for faster start-up)
I hope that I was able to clarify few things for you ๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group