Organizations have expressed the need to see trends across their Databricks Accounts and drill down into Workspaces, SKUs, tags, and users. System Tables provide this visibility with little to no setup needed, just an API call to opt into certain schemas. In this blog, we will leverage Databricks System Tables for cost observability by using the billing, access, query, and compute schemas. We include a common template for a Lakeview Dashboard that is versioned and shareable in a public repo.
System tables provide a centralized way to gain insights into how your Databricks resources are functioning and changing over time. They must be enabled by an account admin and operate at the schema level.
Here are the tables that will help you gain insights into the Account's cost analysis and drill down into the Workspaces, Warehouses, Clusters, Users, and Tags.
Table name |
Definition |
system.access.audit |
Contains entries for events broken out by Service and Action Names. For example, we filter on the notebook service and runCommand to determine the duration of a command run. |
system.billing.usage |
Helps us correlate runs from query history and the access audit tables to what was billed and by whom. |
system.query.history |
Captures user-level query execution information. |
system.compute.clusters |
Contains events relating back to the cluster configuration. |
system.billing.list_prices |
Contains the list prices for a period. |
The billing table provides a custom_tags field that can be used as a parameter to filter workloads that are appropriately tagged. The Tags column in the billing usage table is a map that can be filtered with Lakeview Parameters by leveraging a simple condition to select all tags or specific ones filtering on the key and value of the map as seen below. These Parameters can then be part of the dashboard to filter the visuals appropriately:
:tag_name :: string = 'all'
or u.custom_tags [:tag_name::string] = :tag_value :: string
Parameters for Tag Name and Tag Value in the Dashboard
column |
data type |
source |
description |
workspace_id |
string |
system.access.audit |
The unique ID of the workspace |
workspace_name |
string |
system.access.audit |
The name of the workspace |
usage_date |
date |
system.billing.usage |
Activity date on the workspace |
cost |
decimal |
system.billing.list_prices |
Total cost on the workspace per day |
Tag |
String |
system.billing.usage |
Tag name |
Value |
String |
system.billing.usage |
Value of tag |
The combo chart below displays month-over-month analysis for dollars and DBUs consumed for the Databricks account, representing the consumption of all Workspaces provisioned with the cloud provider used.
The below bar chart shows a drill down into Workspace consumption month over month.
The bar chart below shows the month-over-month $DBU consumption for various Databricks SKUs. The SKUs are taken from the sku_name field in the Usage table, and a simple case statement below gives us the categories of interest. Of course, these can be rolled up to a higher level with the new fields provided in the Usage table, product_features and billing_origin_product.
You can use this case-when statement to help summarize consumption by product SKU.
Case
when u.sku_name like '%SERVERLESS_SQL%' then 'SQL-Serverless'
when u.sku_name like '%SQL_PRO%' then 'SQL-Pro'
when u.sku_name like '%SQL%' then 'SQL-Classic'
WHEN u.sku_name like '%DLT_ADVANCED_COMPUTE%'
and u.sku_name like '%PHOTON%' THEN 'DLT Advanced (PHOTON)'
WHEN u.sku_name like '%DLT_ADVANCED_COMPUTE%' THEN 'DLT Advanced'
WHEN u.sku_name like '%DLT_PRO_COMPUTE%'
and u.sku_name like '%PHOTON%' THEN 'DLT Pro (PHOTON)'
WHEN u.sku_name like '%DLT_PRO_COMPUTE%' THEN 'DLT Pro'
WHEN u.sku_name like '%DLT_CORE_COMPUTE%'
and u.sku_name like '%PHOTON%' THEN 'DLT Core (PHOTON)'
WHEN u.sku_name like '%DLT_CORE_COMPUTE%' THEN 'DLT Core'
when u.sku_name like '%JOBS%'
and u.sku_name like '%PHOTON%' then 'AUTOMATED (PHOTON)'
when u.sku_name like '%JOBS%' then 'AUTOMATED'
when u.sku_name like '%ALL_PURPOSE%'
and u.sku_name like '%PHOTON%' then 'INTERACTIVE (PHOTON)'
when u.sku_name like '%ALL_PURPOSE%' then 'INTERACTIVE'
This widget in the cost attribution dashboard represents the weekly cost for a DBSQL Warehouse.
The below table shows the hourly warehouse consumption pattern and ties back the warehouse usage cost to a user. How it ties back to the usage is based on grouping the query history table into hours in the day so that the duration of the queries executed within the hour ties back to the Usage table for cost. A simple calculation of the duration of the query compared to the hourly total cost can give a sense of the percentage of cost that can be attributed to a user.
*As a disclaimer, the estimates below depict the duration of the query relative to the Usage tables billing for that hour. Other factors may be needed for appropriate chargeback, but this gives a reasonable view of what users are contributing to costs based on query time.
This below table shows the hourly warehouse consumption pattern and ties back the cluster usage cost to a user.
Below is a snippet of how this is achieved, while the full code can be found in the repo.
with usage_by_user as (
select
workspace_id,
executed_by,
compute.warehouse_id,
date(start_time) `event_date`,
hour(start_time) `hour`,
sum(total_duration_ms) / 1000 `total_duration_by_user_in_hour`,
(
select
sum(qhs.total_duration_ms) / 1000
from
system.query.history qhs
where
date(qhs.start_time) = date(qh.start_time)
and date(qhs.end_time) = date(qh.end_time)
and hour(qhs.start_time) = hour(qh.start_time)
and qhs.workspace_id = qh.workspace_id
and qhs.workspace_id = :workspace_id::string
and qhs.compute.warehouse_id = qh.compute.warehouse_id
group by
date(qhs.start_time),
hour(qhs.start_time),
qhs.workspace_id,
qhs.compute.warehouse_id
) `total_duration_by_hour`
from
system.query.history qh
where
date(start_time) >= date_add(week, - :duration_in_weeks :: integer, now())
and qh.workspace_id = :workspace_id::string
group by
all
)
select
u.workspace_id,
u.executed_by,
u.warehouse_id,
u.event_date,
u.hour,
round(t.dollar_dbus, 2) `cost_by_hour`,
round(t.DBUs, 2) `dbus_by_hour`,
u.total_duration_by_user_in_hour,
u.total_duration_by_hour,
t.custom_tags,
round(
(
total_duration_by_user_in_hour / total_duration_by_hour
) * 100,
2
) `percent_consumption_by_user_in_hour`,
round(
t.dollar_dbus * `percent_consumption_by_user_in_hour` / 100,
2
) `cost_by_user_in_hour`
from
usage_by_user u
inner join (
select
date(u.usage_start_time) `event_date`,
hour(u.usage_start_time) `hour`,
u.workspace_id,
u.usage_metadata.warehouse_id `warehouse_id`,
to_json(u.custom_tags) `custom_tags`,
round(sum(usage_quantity), 2) `DBUs`,
sum(usage_quantity * lp.pricing.default) `dollar_dbus`
from
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.account_id = lp.account_id
and u.cloud = lp.cloud
and u.sku_name = lp.sku_name
and u.workspace_id = :workspace_id::string
and u.usage_start_time between lp.price_start_time
and coalesce(lp.price_end_time, u.usage_end_time)
and u.usage_date >= date_add(week, - :duration_in_weeks :: integer, now())
and u.sku_name like '%SQL%'
and (
:tag_name :: string = 'all'
or u.custom_tags [:tag_name::string] = :tag_value :: string
)
group by
all
) T ON u.workspace_id = T.workspace_id
and u.workspace_id = :workspace_id::string
and u.warehouse_id = t.warehouse_id
and u.event_date = t.event_date
and u.hour = t.hour
order by
event_date asc,
hour asc
column |
data type |
source |
description |
workspace_id |
String |
system.query.history |
The ID of the workspace this query statement belongs to |
warehouse_id |
String |
system.query.history.compute |
The ID of the warehouse this query statement runs on |
total_cost_by_hour |
Decimal |
system.billing.usage |
Usage by SKU(e.g Compute Type) per Hour in dollars |
total_dbus_by_hour |
Integer |
system.billing.usage |
Usage by SKU(e.g Compute Type) per Hour in dbus |
total_consumption_by_hour |
Integer |
system.query.history |
diff(request.starttime - request_params.endttime) |
total_consumption_by_user_in_hour |
Integer |
system.query.history |
User’s total usage per hour |
percent_of_consumption_by_user_in_hour |
Decimal |
derived field |
User’s total usage percentage per hour in the warehouse |
cost_by_user_in_hour |
Decimal |
derived field |
User’s total cost per hour in the warehouse |
tags |
MAP<STRING, STRING> |
system.billing.usage |
The user-supplied tags associated with this warehouse |
executed_by |
String |
system.query.history |
The ID of the principal who ran the query |
event_date |
Date |
system.query.history |
The day when the query was executed |
event_hour |
Integer |
system.query.history |
The time when the query was executed |
The bar chart below looks at cluster costs over a time period. The query uses the audit, usage, and cluster tables to display trends. When working with the clusters table, it is important to ensure the max record is taken to retrieve the latest configuration, as seen in the code snippet below. A subquery is leveraged to retrieve the max change record for a given cluster and join against the usage table.
inner join (
select
max(change_time),
cluster_name,
cluster_id,
workspace_id,
account_id
from
system.compute.clusters
group by
all
)
The below table shows the hourly cluster consumption and ties back the usage to a user. How it ties back to the usage is based on grouping the audit log tables and duration of runCommand queries into hours in the day so that the duration of the queries executed within the hour ties back to the Usage table for cost. A simple calculation of the duration of the query compared to the hourly total cost can give a sense of the percentage of cost that can be attributed to the user. It is important to note that considering this is using the Audit Log table, the insights gathered are dependent on verbose logs being enabled to capture notebook commands being run.
*As a disclaimer, the estimates below depict the duration of the query relative to the Usage tables billing for that hour. Other factors may be needed for appropriate chargeback, but this gives a reasonable view of what users are contributing to costs based on query time.
The below table shows the hourly cluster consumption pattern and ties back the cluster usage cost to a user.
Below is a snippet of how this is achieved while the full code can be found in the repo.
with usage_by_user as (
select
a.workspace_id,
a.user_identity.email,
a.request_params.clusterId `cluster_id`,
a.event_date,
hour(a.event_time) `hour`,
sum(a.request_params.executionTime) `total_duration_by_user_in_hour`
from
system.access.audit a
where
a.service_name in ('notebook', 'jobs')
and a.action_name = 'runCommand'
and a.event_date >= date_add(week, - :duration_in_weeks :: integer, now())
and a.request_params.clusterId is not null --and a.workspace_id = :workspace_id :: string
group by
all
),
total_cost_by_cluster as (
select
a1.event_date `event_date`,
hour(a1.event_time) `hour`,
a1.request_params.clusterId `clusterId`,
sum(a1.request_params.executionTime) `total_duration_by_hour`
from
system.access.audit a1
where
a1.service_name in ('notebook', 'jobs')
and a1.action_name = 'runCommand'
and a1.event_date >= date_add(week, - :duration_in_weeks :: integer, now())
and a1.request_params.clusterId is not null --and a1.workspace_id = :workspace_id :: string
group by
all
)
select
u.workspace_id,
u.email `executed_by`,
c.cluster_name,
u.cluster_id,
u.event_date,
u.hour,
round(t.dollar_dbus, 2) `cost_by_hour`,
t.DBUs `dbus_by_hour`,
u.total_duration_by_user_in_hour,
tc.total_duration_by_hour,
t.custom_tags,
round(
(
total_duration_by_user_in_hour / total_duration_by_hour
) * 100,
2
) `percent_consumption_by_user_in_hour`,
round(
t.dollar_dbus * `percent_consumption_by_user_in_hour` / 100,
2
) `cost_by_user_in_hour`
from
usage_by_user u
inner join total_cost_by_cluster tc on u.event_date = tc.event_date
and u.hour = tc.hour
and u.cluster_id = tc.clusterId
inner join (
select
date(u.usage_start_time) `event_date`,
hour(u.usage_start_time) `hour`,
u.workspace_id,
u.usage_metadata.cluster_id `cluster_id`,
to_json(u.custom_tags) `custom_tags`,
round(sum(usage_quantity), 2) `DBUs`,
sum(usage_quantity * lp.pricing.default) `dollar_dbus`
from
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.account_id = lp.account_id
and u.cloud = lp.cloud
and u.sku_name = lp.sku_name
and u.usage_start_time between lp.price_start_time
and coalesce(lp.price_end_time, u.usage_end_time)
and u.usage_date >= date_add(week, - :duration_in_weeks :: integer, now())
and u.sku_name like '%ALL_PURPOSE%'
and (
:tag_name :: string = 'all'
or u.custom_tags [:tag_name::string] = :tag_value :: string
)
group by
all
) T ON u.workspace_id = T.workspace_id
and u.cluster_id = t.cluster_id
and u.event_date = t.event_date
and u.hour = t.hour
inner join (
select
max(change_time),
cluster_name,
cluster_id,
workspace_id,
account_id
from
system.compute.clusters
group by
all
) c on c.cluster_id = u.cluster_id
and c.workspace_id = u.workspace_id
order by
event_date asc,
hour asc
column |
data type |
source |
description |
workspace_id |
String |
system.access.audit |
The ID of the workspace this command statement belongs to |
cluster_id |
String |
system.billing.usage |
The ID of the cluster this command statement belongs to |
cluster_name |
String |
system.billing.usage |
The name of the cluster this command statement belongs to |
total_cost_by_hour |
Decimal |
system.billing.usage |
Usage by SKU(e.g Compute Type) per Hour in dollars |
total_dbus_by_hour |
Integer |
system.billing.usage |
Usage by SKU(e.g Compute Type) per Hour in dbus |
total_consumption_by_hour |
Integer |
system.access.audit |
diff(request.starttime - request_params.endttime) |
total_consumption_by_user_in_hour |
Integer |
system.access.audit |
User’s total usage per hour |
percent_of_consumption_by_user_in_hour |
Decimal |
derived field |
User’s total usage percentage per hour in the cluster |
cost_by_user_in_hour |
Decimal |
derived field |
User’s total cost per hour in the cluster |
tags |
MAP<STRING, STRING> |
system.billing.usage |
The user-supplied tags associated with this warehouse |
executed_by |
String |
system.access.audit |
The ID of the principal who ran the commands |
event_date |
Date |
system.access.audit |
The day when the command was executed |
event_hour |
Integer |
system.access.audit |
The time when the command was executed |
In this blog post, we covered key tables needed to gather insights for cost attribution. The samples provided detailed cost analyses for DBSQL Warehouses, clusters, Workspaces, SKUs, and ysers. This allows organizations to see growing trends in their consumption, which will empower them to make informed decisions, optimize resource utilization, and effectively manage their Databricks consumption across different teams, workspaces, and use cases. Sharing this template with Lakeview Dashboards is the beginning of more templates to come as System Tables evolve. This template can be imported easily by downloading it from the public repo and importing it into a Databricks Workspace with Unity Catalog enabled.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.