cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks costing - Need details of the Azure VM costs

sanket-kelkar
New Contributor II

Hi All,

We are using the Azure Databricks platform for one of our Data Engg needs. Here's my setup -

1. Job compute that uses Cluster of size - 1 driver and 2 workers - all are of 'Standard_DS3_v2' type. (Photon is disabled).

2. The job compute takes the instances from the instance pool since we want to reduce the cluster start-up time. Instance pool uses "All spot" settings and keeps 3 instances idle.

How do I run the job?

1. The job is run via workflows every 30 minutes. It takes 7 to 8 minutes to complete.

The cost of this setup?

Based on my research, I have come up with the below cost estimation-

1. €0.233/hour/instance - For 7-8 mins during which my job is running thus utilizing both DBUs and VMs. (https://azure.microsoft.com/en-in/pricing/details/databricks/)

2. €0.0252/hour/instance - For the rest 22-23 minutes where my instances are idle but no active DBUs are consumed. (https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/#pricing )

When calculating it at the monthly level there's a crazy difference between my estimated and actual costs.

Am I missing anything? One thing that I don't understand is the disk (storage) cost associated with the Azure VMs.

I am happy to share more information as needed on this, but can someone please help to understand the detailed cost? 

4 REPLIES 4

-werners-
Esteemed Contributor III

If you keep instances warm (so online but not doing anything), you pay for them. You do not pay DBUs but MS will bill for every second they are running.  This can become expensive, even with spot pricing if you keep them online for an extended period of time.
So basically what you do is to rule out DBU cost, but not hardware cost.

Storage is another story.  The VM cost consists of CPU and RAM, but also persistent storage (and MS bills these separately).
This storage can be HDD or SSD.  Depending on the VM type, HDD or SSD will be used and depending on the type the storage will be cheaper (but slower).
DS3_v2 uses SSD storage.  If you do not need SSD storage, you can use D3 instead of DS3 (I use these all the time).

Thank you for your reply!

Some follow-up points -

1. Warm instances - Correct! Once the job is completed, it will not charge DBUs but will continue to charge hardware costs. If I check the Azure VM pricing - For DS3V2 Spot instances - the cost is €0.0252/hour. So in my case, it will be warm for 22 minutes-every 30 minutes that's 44 minutes-every hour, so the cost would be €0.018 per hour and I think that's pretty ok in my case. Am I calculating it correctly?

2. Regd the HDD disk - Thank you for the suggestion! I quickly checked the VM price of DS3V2 and D3V2 they are the same for Spot instances i.e., €0.0252/hour. But, in terms of the storage and D3V2 it would create osDisk (Tier S4 - 30GB) and containerRootVolume (Tier S15 - 256GB) disks and would charge €1.42/month and €10.47/month respectively. This is lower than what I am currently paying.

 

-werners-
Esteemed Contributor III

Don't forget the startup time which is also billed.

My experience is that costs go up due to lots of instances being kept warm (a few is not really an issue) and premium storage.  Especially the last one can make a huge difference, learned that the hard way.

GuillermoM
New Contributor II

To calculate the real cost of an Azure Cluster or Job, there are two ways: DIY, which means querying the Microsoft Cost API and Databricks API and then combining the information to get the exact cost, or you can use a tool such as KopiCloud Databricks Costs at https://databrickscost.kopicloud.com to calculate the cost in seconds.

Guillermo

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group