cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to choose a compute, and how to find alternatives for the current compute being used?

Ikanip
New Contributor II

We are using a compute for an Interactive Cluster in Production which incurs X amount of cost. We want to know what are the options available to use with near about the same processing power as the current compute but incur a cost of Y, which is lesser than X.

1 ACCEPTED SOLUTION

Accepted Solutions

Wojciech_BUK
Valued Contributor III

This is exactly how @raphaelblg mentioned.
You have to dig into the MS docs about VM size. 

You cant look just on "hey it is less memory, why is more expensive" ? It is not jus that.

In your example where you compare Dv2 to Dv3 Series you can find in docs that MS changed Memory to CPU ratio so it will be more efficient and also it runs in hyper-threaded configuration. They also adjusted disk and network limits to align with the move to hyperthreading.
Hyperthreading = improve parallelization of computations.

In DS5 V2 you have much higher IOPS and network bandwidth.

It is advised that you move to  Ev3 and Esv3-series if you look for Memory optimized machines. 

Please be aware of azure regions, there might be situation when machine "X" is more expensive then machine "Y" but in other region it might not be the same ๐Ÿ™‚  

So if you swap you compute, you might see drop in performance.

If you are looking for some savings you need to:
- test different VMs 
- check spot instances 
- run Job clusters instead (maybe with pool for faster start-up)

I hope that I was able to clarify few things for you ๐Ÿ™‚  

View solution in original post

4 REPLIES 4

raphaelblg
Honored Contributor II

Hello @Ikanip ,

You can utilize the Databricks Pricing Calculator to estimate costs.

For detailed information on compute capacity, please refer to your cloud provider's documentation regarding Virtual Machine instance types.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

Ikanip
New Contributor II

Hi Raphael,

Thanks for this. 

I will give you a specific example.

Lets say for Azure I choose DS5 V2 which has 16 cores and 56G of RAM costs $1,861.500/month as PayG. and then I choose D16s v3 which also has 16 cores and 64G of RAM costs $1,489.20/month as PayG which is lesser than the former. They are probably using the same 3rd Generation Intelยฎ Xeonยฎ Processors. But what is the difference and why is the difference in costing?

 

 

 

raphaelblg
Honored Contributor II

Hi @Ikanip I suggest checking this with the cloud provider, unfortunately I don't have the details. Databricks cost estimation relies to some extent on cloud provider cost estimation.

 

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

Wojciech_BUK
Valued Contributor III

This is exactly how @raphaelblg mentioned.
You have to dig into the MS docs about VM size. 

You cant look just on "hey it is less memory, why is more expensive" ? It is not jus that.

In your example where you compare Dv2 to Dv3 Series you can find in docs that MS changed Memory to CPU ratio so it will be more efficient and also it runs in hyper-threaded configuration. They also adjusted disk and network limits to align with the move to hyperthreading.
Hyperthreading = improve parallelization of computations.

In DS5 V2 you have much higher IOPS and network bandwidth.

It is advised that you move to  Ev3 and Esv3-series if you look for Memory optimized machines. 

Please be aware of azure regions, there might be situation when machine "X" is more expensive then machine "Y" but in other region it might not be the same ๐Ÿ™‚  

So if you swap you compute, you might see drop in performance.

If you are looking for some savings you need to:
- test different VMs 
- check spot instances 
- run Job clusters instead (maybe with pool for faster start-up)

I hope that I was able to clarify few things for you ๐Ÿ™‚  

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group