- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Databricks Unity Catalog (UC) hosts System Tables which contain the operational data of the account. There are tables to track audit logs, billing, lineage, and more. In this article, we will see how to use the system.billing.usage table to get insights about the DBU consumption of your Databricks account. We will write queries to track usage, create a dashboard for visualization, and set an alert to notify admins if DBU consumption crosses a specific threshold.
Query the Billing System Tables
Each row in the billing usage table contains data about the account id, workspace id, SKU of the compute, usage date and time, custom tags, usage unit and some additional metadata.
In the Databricks SQL Editor or a Databricks Notebook, we can write a query like the one below to fetch details about DBUs consumed and cost (per list price) on a daily basis for each SKU type.
WITH agg_data AS (
SELECT ds, sku1 as sku, workspace_id, SUM(dbus) AS dbus, SUM(cost_at_list_price) AS cost_at_list_price
FROM
(
select workspace_id, ds, dbus, cost_at_list_price, sku,
CASE
WHEN sku LIKE '%ALL_PURPOSE%' THEN 'ALL_PURPOSE'
WHEN sku LIKE '%JOBS%' THEN 'JOBS'
WHEN sku LIKE '%DLT%' THEN 'DLT'
WHEN sku LIKE '%SQL%' THEN 'SQL'
WHEN sku LIKE '%INFERENCE%' THEN 'MODEL_INFERENCE'
ELSE 'OTHER'
END AS sku1
from
(
SELECT u.workspace_id, u.usage_date AS ds, u.sku_name AS sku,
CAST(u.usage_quantity AS DOUBLE) AS dbus,
CAST(lp.pricing.default * usage_quantity AS DOUBLE) AS cost_at_list_price
FROM
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.cloud = lp.cloud
AND u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
WHERE
u.usage_unit = 'DBU'
)
)
GROUP BY ds, sku1, workspace_id
)
select ds, sku, sum(dbus) as dbus, sum(cost_at_list_price) as cost_at_list_price
from
agg_data
where
ds <= now()
and ds >= SUBSTRING_INDEX('{{start_date}}', ' |', 1)
group by ds, sku
In this query, we:
- Get records from the system.billing.usage table, and join it with the system.billing.list_prices table to get the dollar amount from the DBU.
- Use a case statement to group the SKU types into 5 broad categories of ALL_PURPOSE, JOBS, DLT, SQL and MODEL_INFERENCE.
- Group by date and SKU to aggregate daily usage metrics.
- Use a date filter, which will determine the length of time for which we want to know about the daily usage.
If we want the filter to be a variable value, we can use the Query Parameters. Databricks supports static values in the query parameters, like text, number, dates, or a dropdown, but it also supports dynamic values, like the results of another query.
To use the output of a query as a parameter to another query, we will first create the below query.
SELECT CONCAT(start_date, ' | ', label) FROM (
SELECT cast(current_date as date) as start_date , label, o FROM
( SELECT DATE_TRUNC('DAY', CURRENT_DATE()) as current_date, 'Current Date' as label, 1 as o
UNION
SELECT DATE_TRUNC('WEEK', CURRENT_DATE()) as current_date, 'Current Week' as label, 2 as o
UNION
SELECT DATE_TRUNC('MONTH', CURRENT_DATE()) as current_date, 'Current Month' as label, 3 as o
UNION
SELECT DATE_TRUNC('YEAR', CURRENT_DATE()) as current_date, 'Current Year' as label, 4 as o
UNION
SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -6) as current_date, 'Last 6 Months' as label, 5 as o
UNION
SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -12) as current_date, 'Last 12 Months' as label, 6 as o
)
order by o );
Here, we are setting different date ranges based on the current date, and the output of this query will looks like this:
We will use these date ranges as input to our billing usage query. To do that, first we need to save this query. Then we can go to our usage query and select the saved query as a parameter for the start_date filter.
Now we can run our query and get back usage results.
Next, we will add a visualization to see this as a bar chart.
Similarly, we can create other queries and visualizations to view usage per workspace, or per custom tag.
Visualization in a DBSQL Dashboard
We can put all of the queries (created above) in a dashboard, to view all the charts in one page.
The query parameter will appear on top of the dashboard. If we change the date range, all the charts will get refreshed to display the usage for the new time frame.
These dashboards can be automatically refreshed, shared with other users, and a subscription email list can be added to receive a snapshot of the dashboard at set schedules.
Create alert to monitor usage
From the queries created above, we can set alerts which will send out notifications in case the DBU usage goes above a specified threshold.
The ‘Custom template’ allows us to set any custom email format that we want. We have to provide the mail body as HTML. It also lets us include some built-in template variables in our mail body. There is a Formatting Guide hyperlink just below the template which contains these details.
For example, we can set this in the email template:
DBU used: {{QUERY_RESULT_VALUE}}
</br></br>
{{QUERY_RESULT_TABLE}}
</br></br>
DBU threshold: {{ALERT_THRESHOLD}}
</br></br>
Please find the dashboard link <a href="https://<workspace_url>/sql/dashboards/<dashboard_id>?o=<workspace_id>">here</a>.
Here, the <workspace_url>, <dashboard_id>, and <workspace_id>, have to be populated with proper values.
Connect dashboard and alert using a Databricks Workflow
We can set individual refresh schedules for the dashboard and alert, or we can connect these together in a single workflow so that every time the dashboard is refreshed, the alert is also executed to check if the threshold limit is exceeded.
We can set a schedule to run this job daily to refresh the dashboard. If the alert threshold is exceeded, a mail will be sent to the alert subscribers.
This custom email will be sent to the subscribers if the alert gets triggered.
Conclusion
In this article, we learned how to use billing system tables to get insights on our DBU consumption, visualize usage in dashboards, and set alerts. We also discussed how Workflows can be used to refresh the dashboard and execute alerts to send out custom email notifications.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.