Databricks Unity Catalog (UC) hosts System Tables which contain the operational data of the account. There are tables to track audit logs, billing, lineage, and more. In this article, we will see how to use the system.billing.usage table to get insights about the DBU consumption of your Databricks account. We will write queries to track usage, create a dashboard for visualization, and set an alert to notify admins if DBU consumption crosses a specific threshold.
Each row in the billing usage table contains data about the account id, workspace id, SKU of the compute, usage date and time, custom tags, usage unit and some additional metadata.
In the Databricks SQL Editor or a Databricks Notebook, we can write a query like the one below to fetch details about DBUs consumed and cost (per list price) on a daily basis for each SKU type.
WITH agg_data AS (
SELECT ds, sku1 as sku, workspace_id, SUM(dbus) AS dbus, SUM(cost_at_list_price) AS cost_at_list_price
FROM
(
select workspace_id, ds, dbus, cost_at_list_price, sku,
CASE
WHEN sku LIKE '%ALL_PURPOSE%' THEN 'ALL_PURPOSE'
WHEN sku LIKE '%JOBS%' THEN 'JOBS'
WHEN sku LIKE '%DLT%' THEN 'DLT'
WHEN sku LIKE '%SQL%' THEN 'SQL'
WHEN sku LIKE '%INFERENCE%' THEN 'MODEL_INFERENCE'
ELSE 'OTHER'
END AS sku1
from
(
SELECT u.workspace_id, u.usage_date AS ds, u.sku_name AS sku,
CAST(u.usage_quantity AS DOUBLE) AS dbus,
CAST(lp.pricing.default * usage_quantity AS DOUBLE) AS cost_at_list_price
FROM
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.cloud = lp.cloud
AND u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
WHERE
u.usage_unit = 'DBU'
)
)
GROUP BY ds, sku1, workspace_id
)
select ds, sku, sum(dbus) as dbus, sum(cost_at_list_price) as cost_at_list_price
from
agg_data
where
ds <= now()
and ds >= SUBSTRING_INDEX('{{start_date}}', ' |', 1)
group by ds, sku
In this query, we:
If we want the filter to be a variable value, we can use the Query Parameters. Databricks supports static values in the query parameters, like text, number, dates, or a dropdown, but it also supports dynamic values, like the results of another query.
To use the output of a query as a parameter to another query, we will first create the below query.
SELECT CONCAT(start_date, ' | ', label) FROM (
SELECT cast(current_date as date) as start_date , label, o FROM
( SELECT DATE_TRUNC('DAY', CURRENT_DATE()) as current_date, 'Current Date' as label, 1 as o
UNION
SELECT DATE_TRUNC('WEEK', CURRENT_DATE()) as current_date, 'Current Week' as label, 2 as o
UNION
SELECT DATE_TRUNC('MONTH', CURRENT_DATE()) as current_date, 'Current Month' as label, 3 as o
UNION
SELECT DATE_TRUNC('YEAR', CURRENT_DATE()) as current_date, 'Current Year' as label, 4 as o
UNION
SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -6) as current_date, 'Last 6 Months' as label, 5 as o
UNION
SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -12) as current_date, 'Last 12 Months' as label, 6 as o
)
order by o );
Here, we are setting different date ranges based on the current date, and the output of this query will looks like this:
We will use these date ranges as input to our billing usage query. To do that, first we need to save this query. Then we can go to our usage query and select the saved query as a parameter for the start_date filter.
Now we can run our query and get back usage results.
Next, we will add a visualization to see this as a bar chart.
Similarly, we can create other queries and visualizations to view usage per workspace, or per custom tag.
We can put all of the queries (created above) in a dashboard, to view all the charts in one page.
The query parameter will appear on top of the dashboard. If we change the date range, all the charts will get refreshed to display the usage for the new time frame.
These dashboards can be automatically refreshed, shared with other users, and a subscription email list can be added to receive a snapshot of the dashboard at set schedules.
From the queries created above, we can set alerts which will send out notifications in case the DBU usage goes above a specified threshold.
The ‘Custom template’ allows us to set any custom email format that we want. We have to provide the mail body as HTML. It also lets us include some built-in template variables in our mail body. There is a Formatting Guide hyperlink just below the template which contains these details.
For example, we can set this in the email template:
DBU used: {{QUERY_RESULT_VALUE}}
</br></br>
{{QUERY_RESULT_TABLE}}
</br></br>
DBU threshold: {{ALERT_THRESHOLD}}
</br></br>
Please find the dashboard link <a href="https://<workspace_url>/sql/dashboards/<dashboard_id>?o=<workspace_id>">here</a>.
Here, the <workspace_url>, <dashboard_id>, and <workspace_id>, have to be populated with proper values.
We can set individual refresh schedules for the dashboard and alert, or we can connect these together in a single workflow so that every time the dashboard is refreshed, the alert is also executed to check if the threshold limit is exceeded.
We can set a schedule to run this job daily to refresh the dashboard. If the alert threshold is exceeded, a mail will be sent to the alert subscribers.
This custom email will be sent to the subscribers if the alert gets triggered.
In this article, we learned how to use billing system tables to get insights on our DBU consumption, visualize usage in dashboards, and set alerts. We also discussed how Workflows can be used to refresh the dashboard and execute alerts to send out custom email notifications.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.