<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/extracting-cost-by-user-run-by-for-all-purpose-clusters-and-sql/m-p/123424#M47006</link>
    <description>&lt;P&gt;Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps, Lou.&lt;/P&gt;</description>
    <pubDate>Tue, 01 Jul 2025 12:02:02 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2025-07-01T12:02:02Z</dc:date>
    <item>
      <title>Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage</title>
      <link>https://community.databricks.com/t5/data-engineering/extracting-cost-by-user-run-by-for-all-purpose-clusters-and-sql/m-p/123372#M46994</link>
      <description>&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm trying to extract usage cost per user (run_by) for workloads that utilize all-purpose clusters and SQL warehouses. I’ve been exploring the system.billing.usage table but noticed some challenges:&lt;/P&gt;&lt;P&gt;1. For records related to all-purpose clusters and SQL warehouses, the identity_metadata column often has null values for the run_as key.&lt;BR /&gt;2. The usage_metadata column also lacks identifying information like job_id, job_run_id, notebook_id, or job_name, making it hard to determine how the compute was used.&lt;/P&gt;&lt;P&gt;While joining system.billing.usage with system.access.audit on cluster_id and event_date helps retrieve some additional context, there are still many rows with no user info (run_by and run_as are both null).&lt;/P&gt;&lt;P&gt;Given that both all-purpose clusters and SQL warehouses can be used by multiple users (including ad hoc usage), I’m trying to determine:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Is there a reliable way to distinguish whether a usage entry was triggered via a job or individual user activity?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;More importantly, is there any way to consistently populate run_by or run_as for all-purpose cluster and SQL warehouse entries, so we can compute cost per user accurately?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any insights, best practices, or workarounds would be appreciated.&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 01 Jul 2025 02:30:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/extracting-cost-by-user-run-by-for-all-purpose-clusters-and-sql/m-p/123372#M46994</guid>
      <dc:creator>devyani_k</dc:creator>
      <dc:date>2025-07-01T02:30:51Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting cost by user (run_by) for All-purpose clusters and SQL warehouse usage</title>
      <link>https://community.databricks.com/t5/data-engineering/extracting-cost-by-user-run-by-for-all-purpose-clusters-and-sql/m-p/123424#M47006</link>
      <description>&lt;P&gt;Attribution of compute usage to individual users for all-purpose clusters and SQL warehouses is only partially supported. Job compute (including serverless jobs) and workflows are reliably attributable to the job owner/service principal. For interactive workloads and shared resources, attribution will remain an estimate and not all records can be tied to a user. Best practice: Use job/cluster-level billing, join with access and activity event logs for approximation as needed, and leverage new tagging/budget features for future workloads. Direct cluster or per-query billing is not available today, and audited cost-per-user at high precision is currently not feasible for all usage scenarios&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps, Lou.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Jul 2025 12:02:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/extracting-cost-by-user-run-by-for-all-purpose-clusters-and-sql/m-p/123424#M47006</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-07-01T12:02:02Z</dc:date>
    </item>
  </channel>
</rss>

