Hi All,
We are using the Azure Databricks platform for one of our Data Engg needs. Here's my setup -
1. Job compute that uses Cluster of size - 1 driver and 2 workers - all are of 'Standard_DS3_v2' type. (Photon is disabled).
2. The job compute takes the instances from the instance pool since we want to reduce the cluster start-up time. Instance pool uses "All spot" settings and keeps 3 instances idle.
How do I run the job?
1. The job is run via workflows every 30 minutes. It takes 7 to 8 minutes to complete.
The cost of this setup?
Based on my research, I have come up with the below cost estimation-
1. โฌ0.233/hour/instance - For 7-8 mins during which my job is running thus utilizing both DBUs and VMs. (https://azure.microsoft.com/en-in/pricing/details/databricks/)
2. โฌ0.0252/hour/instance - For the rest 22-23 minutes where my instances are idle but no active DBUs are consumed. (https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/#pricing )
When calculating it at the monthly level there's a crazy difference between my estimated and actual costs.
Am I missing anything? One thing that I don't understand is the disk (storage) cost associated with the Azure VMs.
I am happy to share more information as needed on this, but can someone please help to understand the detailed cost?