<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks costing - Need details of the Azure VM costs in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44728#M27702</link>
    <description>&lt;P&gt;Don't forget the startup time which is also billed.&lt;/P&gt;&lt;P&gt;My experience is that costs go up due to lots of instances being kept warm (a few is not really an issue) and premium storage.&amp;nbsp; Especially the last one can make a huge difference, learned that the hard way.&lt;/P&gt;</description>
    <pubDate>Thu, 14 Sep 2023 09:29:32 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-09-14T09:29:32Z</dc:date>
    <item>
      <title>Databricks costing - Need details of the Azure VM costs</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44692#M27694</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;We are using the Azure Databricks platform for one of our Data Engg needs. Here's my setup -&lt;/P&gt;&lt;P&gt;1. Job compute that uses Cluster of size - 1 driver and 2 workers - all are of '&lt;SPAN&gt;Standard_DS3_v2' type. (Photon is disabled).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. The job compute takes the instances from the instance pool since we want to reduce the cluster start-up time. Instance pool uses "All spot" settings and keeps 3 instances idle.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How do I run the job?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1. The job is run via workflows every 30 minutes. It takes 7 to 8 minutes to complete.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The cost of this setup?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Based on my research, I have come up with the below cost estimation-&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1.&amp;nbsp;&lt;SPAN class=""&gt;€0.233&lt;/SPAN&gt;/hour/instance - For 7-8 mins during which my job is running thus utilizing both DBUs and VMs. (&lt;A href="https://azure.microsoft.com/en-in/pricing/details/databricks/" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-in/pricing/details/databricks/&lt;/A&gt;)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp;&lt;SPAN class=""&gt;€0.0252&lt;/SPAN&gt;/hour/instance - For the rest 22-23 minutes where my instances are idle but no active DBUs are consumed. (&lt;A href="https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/#pricing" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/#pricing&lt;/A&gt; )&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;When calculating it at the monthly level there's a crazy difference between my estimated and actual costs.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Am I missing anything? One thing that I don't understand is the disk (storage) cost associated with the Azure VMs.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am happy to share more information as needed on this, but can someone please help to understand the detailed cost?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Sep 2023 05:37:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44692#M27694</guid>
      <dc:creator>sanket-kelkar</dc:creator>
      <dc:date>2023-09-14T05:37:16Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks costing - Need details of the Azure VM costs</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44716#M27696</link>
      <description>&lt;P&gt;If you keep instances warm (so online but not doing anything), you pay for them. You do not pay DBUs but MS will bill for every second they are running.&amp;nbsp; This can become expensive, even with spot pricing if you keep them online for an extended period of time.&lt;BR /&gt;So basically what you do is to rule out DBU cost, but not hardware cost.&lt;BR /&gt;&lt;BR /&gt;Storage is another story.&amp;nbsp; The VM cost consists of CPU and RAM, but also persistent storage (and MS bills these separately).&lt;BR /&gt;This storage can be HDD or SSD.&amp;nbsp; Depending on the VM type, HDD or SSD will be used and depending on the type the storage will be cheaper (but slower).&lt;BR /&gt;DS3_v2 uses SSD storage.&amp;nbsp; If you do not need SSD storage, you can use D3 instead of DS3 (I use these all the time).&lt;/P&gt;</description>
      <pubDate>Thu, 14 Sep 2023 08:03:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44716#M27696</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-09-14T08:03:30Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks costing - Need details of the Azure VM costs</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44726#M27701</link>
      <description>&lt;P&gt;Thank you for your reply!&lt;/P&gt;&lt;P&gt;Some follow-up points -&lt;/P&gt;&lt;P&gt;1. Warm instances - Correct! Once the job is completed, it will not charge DBUs but will continue to charge hardware costs. If I check the Azure VM pricing - For DS3V2 Spot instances - the cost is&amp;nbsp;&lt;SPAN class=""&gt;€0.0252&lt;/SPAN&gt;&lt;SPAN&gt;/hour. So in my case, it will be warm for 22 minutes-every 30 minutes that's 44 minutes-every hour, so the cost would be&amp;nbsp;&lt;SPAN class=""&gt;€0.018 per hour and I think that's pretty ok in my case. Am I calculating it correctly?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;2. Regd the HDD disk - Thank you for the suggestion! I quickly checked the VM price of DS3V2 and D3V2 they are the same for Spot instances i.e.,&amp;nbsp;€0.0252/hour. But, in terms of the storage and D3V2 it would create&amp;nbsp;&lt;STRONG&gt;osDisk (Tier S4 - 30GB)&lt;/STRONG&gt;&amp;nbsp;and &lt;STRONG&gt;containerRootVolume&lt;/STRONG&gt;&amp;nbsp;&lt;STRONG&gt;(Tier S15 - 256GB)&lt;/STRONG&gt; disks and would charge&amp;nbsp;€1.42/month and&amp;nbsp;€10.47/month respectively. This is lower than what I am currently paying.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Sep 2023 09:21:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44726#M27701</guid>
      <dc:creator>sanket-kelkar</dc:creator>
      <dc:date>2023-09-14T09:21:13Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks costing - Need details of the Azure VM costs</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44728#M27702</link>
      <description>&lt;P&gt;Don't forget the startup time which is also billed.&lt;/P&gt;&lt;P&gt;My experience is that costs go up due to lots of instances being kept warm (a few is not really an issue) and premium storage.&amp;nbsp; Especially the last one can make a huge difference, learned that the hard way.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Sep 2023 09:29:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/44728#M27702</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-09-14T09:29:32Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks costing - Need details of the Azure VM costs</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/73744#M34660</link>
      <description>&lt;P&gt;To calculate the real cost of an Azure Cluster or Job, there are two ways: DIY, which means querying the Microsoft Cost API and Databricks API and then combining the information to get the exact cost, or you can use a tool such as KopiCloud Databricks Costs at &lt;A href="https://databrickscost.kopicloud.com/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://databrickscost.kopicloud.com&lt;/A&gt; to calculate the cost in seconds.&lt;/P&gt;&lt;P&gt;Guillermo&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 07:53:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-costing-need-details-of-the-azure-vm-costs/m-p/73744#M34660</guid>
      <dc:creator>GuillermoM</dc:creator>
      <dc:date>2024-06-13T07:53:40Z</dc:date>
    </item>
  </channel>
</rss>

