07-17-2025 08:14 PM
Hi.
I can see in our Azure cost analysis tool that a not insignificant part of our costs come from the managed Databricks RG deployed with the workspace, and that it relates particularly to VMs (so compute, I assume?) and storage, the later which, though I own the RG, I am not allowed to dive into. Note that we use a different ADLS Gen2 account for storage of data.
We use both dedicated clusters, job clusters and a serverless SQL DW. How can I find out which of these to "tune" on in order to bring down the costs of the VMs deployed as infra for the compute?
By the way, this enforcing of a label is rather daft when I cannot choose one that actually pertains to the matter at hand.
07-17-2025 08:36 PM
You can do a couple of things
1. Explore system tables in Databricks for more granular level details on the billing
2. Add tags to your resources while creating them in Databricks. This will help you in tagging the spend at more granular level
07-20-2025 06:24 PM
Hello @_G_ .
That would be Databricks costs. I am trying to tie those to Databricks related Azure costs, of which you cannot find any data in the system tables. VMs in the Databricks managed RG seem to be automatically provisioned by Databricks to support compute workloads, and are billed separately (by Azure). I am trying to figure out which compute so that I can tune on them to reduce the costs incurred by the provisioning of these VMs.
07-20-2025 11:16 PM
I have been through this and spoke with our AM at Databricks. While in some analysis you'll see the VM costs exposed. They are hidden in the Azure Subscription costs screen as they are part of one of the higher order costs. Those being SQL Warehouse or Serverless which is running pipelines. depending on the usage. If it is SQL Warehouse or Serverless for pipelines you don't need to concern yourself with them unless you want to tag and distribute internal bills for pipelines and costs, Beware if you're pulling costs from the Azure API for Databricks, you can see some anomalies. I have seen them from someone else's work, however, I haven't had time as of yet to drill into how they got the erroneous results.
Have you got the Databricks costs dashboard running?
07-20-2025 11:48 PM
To optimize Databricks compute costs, focus on adjusting the cluster types and scaling options. For example, using auto-scaling on job clusters and considering spot instances for dedicated clusters can significantly reduce VM costs.
07-21-2025 02:13 AM
Regarding costs of infrastructure, you can find them in the Azure Portal Cost Management. If you need detailed information about serverless and compute, you can import dashboards to your workspace from Unity Catalog Account Console.
07-21-2025 11:09 PM
Thank you all for your replies. The issue is not getting an overview of costs - I already have that from the Cost Management Export function in Azure, and by using the system.billing tables in Databricks. The issue is understanding the relation between the different types of compute, and the VMs which are automatically provisioned in the Databricks managed RG to support the compute, for which we are billed separately.
I know how to "tune" a cluster itself to make the clusters in terms of DBUs cheaper, but if I do not know the relation between the different types of compute (dedicated cluster, job cluster, SQL DW etc) and the aforementioned VMs, I cannot know whether the aggregate cost (Databricks compute + Azure infra) for compute is necessarily cheaper, because it's only focusing on one part of the equation (Databricks compute and *not* Azure infra - VMs). Hopefully what I am asking is a bit more clear now.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now