cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

devyani_k
New Contributor

Hi,

I'm trying to extract usage cost per user (run_by) for workloads that utilize all-purpose clusters and SQL warehouses. I’ve been exploring the system.billing.usage table but noticed some challenges:

1. For records related to all-purpose clusters and SQL warehouses, the identity_metadata column often has null values for the run_as key.
2. The usage_metadata column also lacks identifying information like job_id, job_run_id, notebook_id, or job_name, making it hard to determine how the compute was used.

While joining system.billing.usage with system.access.audit on cluster_id and event_date helps retrieve some additional context, there are still many rows with no user info (run_by and run_as are both null).

Given that both all-purpose clusters and SQL warehouses can be used by multiple users (including ad hoc usage), I’m trying to determine:

  1. Is there a reliable way to distinguish whether a usage entry was triggered via a job or individual user activity?

  2. More importantly, is there any way to consistently populate run_by or run_as for all-purpose cluster and SQL warehouse entries, so we can compute cost per user accurately?

Any insights, best practices, or workarounds would be appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios

 

Hope this helps, Lou.

View solution in original post

1 REPLY 1

BigRoux
Databricks Employee
Databricks Employee

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios

 

Hope this helps, Lou.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now