cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Business dashboards - Interactive cluster Total dollar spent

Data_NXT
New Contributor II

I'm working on Databricks Business Dashboards and trying to calculate interactive cluster compute time and total dollar spend per workspace.

As per standard understanding, the total dollar spent = Interactive Clusters + Job Clusters + SQL Warehouses.

I'm able to fetch total spend using system, billing, usage and price references from system ,billing ,list prices. However, to isolate Interactive Cluster usage only, is there any Databricks-recommended filter (e.g., cluster type, tags, or system fields) that cleanly segregates Interactive Clusters from Job or SQL clusters?

Appreciate any documented best practices or SQL-level filtering logic you've used successfully.

2 ACCEPTED SOLUTIONS

Accepted Solutions

nayan_wylde
Honored Contributor III

@Data_NXT You can look for the following filters.

1. sku_name: STANDARD_ALL_PURPOSE_COMPUTE OR PREMIUM_ALL_PURPOSE_COMPUTE

2.  usage_metadata.cluster_id: 

Present for non-serverless compute, including interactive clusters.
Absence of job_id or warehouse_id further confirms it's not a job or SQL warehouse.

%sql
SELECT 
  workspace_id,
  usage_date,
  usage_quantity,
  sku_name,
  usage_metadata.cluster_id,
  identity_metadata.run_as
FROM system.billing.usage
WHERE sku_name IN ( 'STANDARD_ALL_PURPOSE_COMPUTE','PREMIUM_ALL_PURPOSE_COMPUTE')
  AND usage_metadata.cluster_id IS NOT NULL
  AND usage_metadata.job_id IS NULL
  AND usage_metadata.warehouse_id IS NULL

 

View solution in original post

nayan_wylde
Honored Contributor III

Also the system table will not provide you the  exact dollar amount that you spend in an interactive compute. 

Here is the cost breakdown for running interactive compute:

ComponentDescriptionCost Source
DBU CostBased on workload type and tierDatabricks
VM CostBased on instance type and regionAzure/AWS
StorageBlob + managed diskAzure/AWS
NetworkingNAT gateway, private endpointsAzure/AWS
DiscountsDBCU, Reserved VM, SpotDatabricks + Azure/AWS
  • Best Practices for Accurate Cost Attribution
    1. Use system.billing.usage table to track DBU consumption.
    2. Join with Azure Cost Management data to get VM and infra costs.
    3. Apply filters for workload type (sku_name, cluster_id, etc.) to isolate cluster types.
    4. Monitor discount application via Azure Reservations portal.

View solution in original post

3 REPLIES 3

nayan_wylde
Honored Contributor III

@Data_NXT You can look for the following filters.

1. sku_name: STANDARD_ALL_PURPOSE_COMPUTE OR PREMIUM_ALL_PURPOSE_COMPUTE

2.  usage_metadata.cluster_id: 

Present for non-serverless compute, including interactive clusters.
Absence of job_id or warehouse_id further confirms it's not a job or SQL warehouse.

%sql
SELECT 
  workspace_id,
  usage_date,
  usage_quantity,
  sku_name,
  usage_metadata.cluster_id,
  identity_metadata.run_as
FROM system.billing.usage
WHERE sku_name IN ( 'STANDARD_ALL_PURPOSE_COMPUTE','PREMIUM_ALL_PURPOSE_COMPUTE')
  AND usage_metadata.cluster_id IS NOT NULL
  AND usage_metadata.job_id IS NULL
  AND usage_metadata.warehouse_id IS NULL

 

nayan_wylde
Honored Contributor III

Also the system table will not provide you the  exact dollar amount that you spend in an interactive compute. 

Here is the cost breakdown for running interactive compute:

ComponentDescriptionCost Source
DBU CostBased on workload type and tierDatabricks
VM CostBased on instance type and regionAzure/AWS
StorageBlob + managed diskAzure/AWS
NetworkingNAT gateway, private endpointsAzure/AWS
DiscountsDBCU, Reserved VM, SpotDatabricks + Azure/AWS
  • Best Practices for Accurate Cost Attribution
    1. Use system.billing.usage table to track DBU consumption.
    2. Join with Azure Cost Management data to get VM and infra costs.
    3. Apply filters for workload type (sku_name, cluster_id, etc.) to isolate cluster types.
    4. Monitor discount application via Azure Reservations portal.

That was really very helpful. Thanks again.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now