a week ago
I'm working on Databricks Business Dashboards and trying to calculate interactive cluster compute time and total dollar spend per workspace.
As per standard understanding, the total dollar spent = Interactive Clusters + Job Clusters + SQL Warehouses.
I'm able to fetch total spend using system, billing, usage and price references from system ,billing ,list prices. However, to isolate Interactive Cluster usage only, is there any Databricks-recommended filter (e.g., cluster type, tags, or system fields) that cleanly segregates Interactive Clusters from Job or SQL clusters?
Appreciate any documented best practices or SQL-level filtering logic you've used successfully.
a week ago - last edited a week ago
@Data_NXT You can look for the following filters.
1. sku_name: STANDARD_ALL_PURPOSE_COMPUTE OR PREMIUM_ALL_PURPOSE_COMPUTE
2. usage_metadata.cluster_id:
Present for non-serverless compute, including interactive clusters.
Absence of job_id or warehouse_id further confirms it's not a job or SQL warehouse.
%sql
SELECT
workspace_id,
usage_date,
usage_quantity,
sku_name,
usage_metadata.cluster_id,
identity_metadata.run_as
FROM system.billing.usage
WHERE sku_name IN ( 'STANDARD_ALL_PURPOSE_COMPUTE','PREMIUM_ALL_PURPOSE_COMPUTE')
AND usage_metadata.cluster_id IS NOT NULL
AND usage_metadata.job_id IS NULL
AND usage_metadata.warehouse_id IS NULL
a week ago
Also the system table will not provide you the exact dollar amount that you spend in an interactive compute.
Here is the cost breakdown for running interactive compute:
Component | Description | Cost Source |
DBU Cost | Based on workload type and tier | Databricks |
VM Cost | Based on instance type and region | Azure/AWS |
Storage | Blob + managed disk | Azure/AWS |
Networking | NAT gateway, private endpoints | Azure/AWS |
Discounts | DBCU, Reserved VM, Spot | Databricks + Azure/AWS |
a week ago - last edited a week ago
@Data_NXT You can look for the following filters.
1. sku_name: STANDARD_ALL_PURPOSE_COMPUTE OR PREMIUM_ALL_PURPOSE_COMPUTE
2. usage_metadata.cluster_id:
Present for non-serverless compute, including interactive clusters.
Absence of job_id or warehouse_id further confirms it's not a job or SQL warehouse.
%sql
SELECT
workspace_id,
usage_date,
usage_quantity,
sku_name,
usage_metadata.cluster_id,
identity_metadata.run_as
FROM system.billing.usage
WHERE sku_name IN ( 'STANDARD_ALL_PURPOSE_COMPUTE','PREMIUM_ALL_PURPOSE_COMPUTE')
AND usage_metadata.cluster_id IS NOT NULL
AND usage_metadata.job_id IS NULL
AND usage_metadata.warehouse_id IS NULL
a week ago
Also the system table will not provide you the exact dollar amount that you spend in an interactive compute.
Here is the cost breakdown for running interactive compute:
Component | Description | Cost Source |
DBU Cost | Based on workload type and tier | Databricks |
VM Cost | Based on instance type and region | Azure/AWS |
Storage | Blob + managed disk | Azure/AWS |
Networking | NAT gateway, private endpoints | Azure/AWS |
Discounts | DBCU, Reserved VM, Spot | Databricks + Azure/AWS |
a week ago
That was really very helpful. Thanks again.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now