ā10-21-2025 02:13 AM - edited ā10-21-2025 02:15 AM
Hello Everyone,
We are currently working on a cost analysis initiative to gain deeper insights into our Databricks usage. As part of this effort, we are trying to calculate the hourly cost of each Databricks compute instance by utilizing the Azure Retail Prices API and storing this data in a table for further analysis.
During this process, we encountered a challenge in accurately calculating the cost per compute type. After consulting with Azure Support, we understood that the total cost displayed on Azure Databricks Pricing Page is composed of two components: the VM (infrastructure) cost and the DBU (Databricks Unit) cost.
While we are able to determine the VM cost using the Azure API, we are unable to map the correct number of DBUs per compute type to complete the total cost calculation. We have not found a reliable way to determine the DBU count associated with each VM/compute type.
My questions to the community:
Is there a documented way to retrieve the DBU count per VM or compute type?
Is this information available through any Databricks APIs or system-level tables that we can query?
Has anyone built a similar cost model and can share tips or best practices?
Any guidance or pointers would be really appreciated!
Thanks in advance,
Charan.
ā10-21-2025 03:54 AM - edited ā10-21-2025 03:55 AM
@saicharandeepb have you looked at the system billing tables in Databricks yet? https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/billing
There seems to be a field that can display the unit usage in DBU.
Same in this table aswell:
https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/pricing
All the best,
BS
ā10-21-2025 06:32 AM
The majority of the relevant information can be found in the system.billing.usage and system.compute.clusters tables.
To view DBUs by instance type or compute, you can run the following query and explore more of the fields in these tables
SELECT
c.worker_node_type AS vm_instance,
SUM(u.usage_quantity) AS total_dbus
FROM
system.billing.usage u
JOIN
system.compute.clusters c
ON
u.usage_metadata.cluster_id = c.cluster_id
GROUP BY
c.worker_node_type
ORDER BY
total_dbus DESC;
ā10-21-2025 06:49 AM
1. Is there a documented way to retrieve the DBU count per VM or compute type?
Yes, but it's not directly exposed via a single API or table. The DBU consumption rate depends on:
Compute type (Jobs Compute, All-Purpose Compute, SQL Compute, etc.)
VM instance type (e.g., Standard_D16s_v3)
Databricks pricing tier (Standard, Premium, Enterprise)
Cloud provider (Azure, AWS, GCP)
Databricks provides DBU calculators and pricing matrices per cloud provider, which list DBU rates per instance type and workload type. For Azure, you can refer to the https://www.databricks.com/product/pricing/product-pricing/instance-types
SELECT
usage_date,
sku_name,
usage_quantity AS dbus_consumed,
usage_metadata.cluster_id,
usage_metadata.job_id
FROM system.billing.usage
WHERE usage_unit = 'DBU'
ORDER BY usage_date DESC; 3. Has anyone built a similar cost model and can share tips or best practices?
Yes, Some of the best practices that I follow are:
a month ago
Hi everyone, just to clarify my question ā Iām looking for the DBU count per compute type (per instance type), not the total DBU consumption per workload.
In other words, I want to know the fixed DBU rate assigned to each compute SKU (for example, DS3 v2 = 0.75 DBU/hour, DS4 v2 = 1.5 DBU/hour, DS5 v2 = 3.0 DBU/hour, etc.) so that I can accurately estimate costs for different cluster configurations.
Iām not referring to usage or billing metrics that show total DBUs consumed by workloads over time ā I just need the reference values that define how many DBUs are billed per hour for each compute type.
Thanks in advance for any guidance.
Thursday
Databricks doesnāt expose a āDBU/hour per node typeā lookup in system tables, but you can derive it empirically by dividing total DBUs recorded in the billable usage system table by total node runtime from the compute node timeline for each instance type. This gives you the effective DBU/hour per node type for your environment and SKU.
WITH node_minutes AS (
SELECT
node_type,
COUNT(*) AS minutes -- node_timeline is per-minute records
FROM system.compute.node_timeline
GROUP BY node_type
),
dbu_by_node AS (
SELECT
u.usage_metadata.node_type AS node_type,
u.sku_name,
SUM(u.usage_quantity) AS dbus
FROM system.billing.usage AS u
WHERE u.usage_unit = 'DBU'
AND u.usage_metadata.node_type IS NOT NULL
GROUP BY ALL
)
SELECT
d.node_type,
d.sku_name,
d.dbus,
m.minutes,
d.dbus / (m.minutes / 60.0) AS dbu_per_node_hour
FROM dbu_by_node d
JOIN node_minutes m USING (node_type)
ORDER BY dbu_per_node_hour DESC;
yesterday
You can also take a look at this built-in cost control dashboard explained in the below video or official databricks documentation at https://docs.databricks.com/aws/en/admin/usage/ . Concerning the dashboard, relevant subject for me was you can inspect all the underlying system tables by accessing to que internal data model and queries.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityāsign up today to get started!
Sign Up Now