Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

devyani_k — Tue, 01 Jul 2025 02:30:51 GMT

Hi,

I'm trying to extract usage cost per user (run_by) for workloads that utilize all-purpose clusters and SQL warehouses. I’ve been exploring the system.billing.usage table but noticed some challenges:

1. For records related to all-purpose clusters and SQL warehouses, the identity_metadata column often has null values for the run_as key.
2. The usage_metadata column also lacks identifying information like job_id, job_run_id, notebook_id, or job_name, making it hard to determine how the compute was used.

While joining system.billing.usage with system.access.audit on cluster_id and event_date helps retrieve some additional context, there are still many rows with no user info (run_by and run_as are both null).

Given that both all-purpose clusters and SQL warehouses can be used by multiple users (including ad hoc usage), I’m trying to determine:

Is there a reliable way to distinguish whether a usage entry was triggered via a job or individual user activity?
More importantly, is there any way to consistently populate run_by or run_as for all-purpose cluster and SQL warehouse entries, so we can compute cost per user accurately?

Any insights, best practices, or workarounds would be appreciated.

Thanks in advance.

Re: Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

Louis_Frolio — Tue, 01 Jul 2025 12:02:02 GMT

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios

Hope this helps, Lou.

topic Re: Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage in Data Engineering

Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

Re: Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage