Databricks Community

devyani_k · ‎06-30-2025

Hi,

I'm trying to extract usage cost per user (run_by) for workloads that utilize all-purpose clusters and SQL warehouses. I’ve been exploring the system.billing.usage table but noticed some challenges:

1. For records related to all-purpose clusters and SQL warehouses, the identity_metadata column often has null values for the run_as key.
2. The usage_metadata column also lacks identifying information like job_id, job_run_id, notebook_id, or job_name, making it hard to determine how the compute was used.

While joining system.billing.usage with system.access.audit on cluster_id and event_date helps retrieve some additional context, there are still many rows with no user info (run_by and run_as are both null).

Given that both all-purpose clusters and SQL warehouses can be used by multiple users (including ad hoc usage), I’m trying to determine:

Is there a reliable way to distinguish whether a usage entry was triggered via a job or individual user activity?
More importantly, is there any way to consistently populate run_by or run_as for all-purpose cluster and SQL warehouse entries, so we can compute cost per user accurately?

Any insights, best practices, or workarounds would be appreciated.

Thanks in advance.

Louis_Frolio · ‎07-01-2025

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios

Hope this helps, Lou.

View solution in original post

Louis_Frolio · ‎07-01-2025

Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios

Hope this helps, Lou.

Databricks Community

Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples