cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks costing - Need details of the Azure VM costs

sanket-kelkar
New Contributor II

Hi All,

We are using the Azure Databricks platform for one of our Data Engg needs. Here's my setup -

1. Job compute that uses Cluster of size - 1 driver and 2 workers - all are of 'Standard_DS3_v2' type. (Photon is disabled).

2. The job compute takes the instances from the instance pool since we want to reduce the cluster start-up time. Instance pool uses "All spot" settings and keeps 3 instances idle.

How do I run the job?

1. The job is run via workflows every 30 minutes. It takes 7 to 8 minutes to complete.

The cost of this setup?

Based on my research, I have come up with the below cost estimation-

1. €0.233/hour/instance - For 7-8 mins during which my job is running thus utilizing both DBUs and VMs. (https://azure.microsoft.com/en-in/pricing/details/databricks/)

2. €0.0252/hour/instance - For the rest 22-23 minutes where my instances are idle but no active DBUs are consumed. (https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/#pricing )

When calculating it at the monthly level there's a crazy difference between my estimated and actual costs.

Am I missing anything? One thing that I don't understand is the disk (storage) cost associated with the Azure VMs.

I am happy to share more information as needed on this, but can someone please help to understand the detailed cost? 

3 REPLIES 3

-werners-
Esteemed Contributor III

If you keep instances warm (so online but not doing anything), you pay for them. You do not pay DBUs but MS will bill for every second they are running.  This can become expensive, even with spot pricing if you keep them online for an extended period of time.
So basically what you do is to rule out DBU cost, but not hardware cost.

Storage is another story.  The VM cost consists of CPU and RAM, but also persistent storage (and MS bills these separately).
This storage can be HDD or SSD.  Depending on the VM type, HDD or SSD will be used and depending on the type the storage will be cheaper (but slower).
DS3_v2 uses SSD storage.  If you do not need SSD storage, you can use D3 instead of DS3 (I use these all the time).

Thank you for your reply!

Some follow-up points -

1. Warm instances - Correct! Once the job is completed, it will not charge DBUs but will continue to charge hardware costs. If I check the Azure VM pricing - For DS3V2 Spot instances - the cost is €0.0252/hour. So in my case, it will be warm for 22 minutes-every 30 minutes that's 44 minutes-every hour, so the cost would be €0.018 per hour and I think that's pretty ok in my case. Am I calculating it correctly?

2. Regd the HDD disk - Thank you for the suggestion! I quickly checked the VM price of DS3V2 and D3V2 they are the same for Spot instances i.e., €0.0252/hour. But, in terms of the storage and D3V2 it would create osDisk (Tier S4 - 30GB) and containerRootVolume (Tier S15 - 256GB) disks and would charge €1.42/month and €10.47/month respectively. This is lower than what I am currently paying.

 

-werners-
Esteemed Contributor III

Don't forget the startup time which is also billed.

My experience is that costs go up due to lots of instances being kept warm (a few is not really an issue) and premium storage.  Especially the last one can make a huge difference, learned that the hard way.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.