cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
MohanaBasak
New Contributor III

Databricks Unity Catalog (UC) hosts System Tables which contain the operational data of the account. There are tables to track audit logs, billing, lineage, and more.  In this article, we will see how to use the system.billing.usage table to get insights about the DBU consumption of your Databricks account. We will write queries to track usage, create a dashboard for visualization, and set an alert to notify admins if DBU consumption crosses a specific threshold.

Query the Billing System Tables

Each row in the billing usage table contains data about the account id, workspace id, SKU of the compute, usage date and time, custom tags, usage unit and some additional metadata.

In the Databricks SQL Editor or a Databricks Notebook, we can write a query like the one below to fetch details about DBUs consumed and cost (per list price) on a daily basis for each SKU type.

 

WITH agg_data AS (
 SELECT ds, sku1 as sku, workspace_id, SUM(dbus) AS dbus, SUM(cost_at_list_price) AS cost_at_list_price
 FROM
   (
     select workspace_id, ds, dbus, cost_at_list_price, sku,
       CASE
         WHEN sku LIKE '%ALL_PURPOSE%' THEN 'ALL_PURPOSE'
         WHEN sku LIKE '%JOBS%' THEN 'JOBS'
         WHEN sku LIKE '%DLT%' THEN 'DLT'
         WHEN sku LIKE '%SQL%' THEN 'SQL'
         WHEN sku LIKE '%INFERENCE%' THEN 'MODEL_INFERENCE'
         ELSE 'OTHER'
       END AS sku1
     from
       (
         SELECT u.workspace_id, u.usage_date AS ds, u.sku_name AS sku,
           CAST(u.usage_quantity AS DOUBLE) AS dbus,
           CAST(lp.pricing.default * usage_quantity AS DOUBLE) AS cost_at_list_price
         FROM
           system.billing.usage u
           INNER JOIN system.billing.list_prices lp ON u.cloud = lp.cloud
           AND u.sku_name = lp.sku_name
           AND u.usage_start_time >= lp.price_start_time
         WHERE
           u.usage_unit = 'DBU'
       )
   )
 GROUP BY ds, sku1, workspace_id
)
select ds, sku, sum(dbus) as dbus, sum(cost_at_list_price) as cost_at_list_price
from
 agg_data
where
 ds <= now()
 and ds >= SUBSTRING_INDEX('{{start_date}}', ' |', 1)
group by ds, sku

 

In this query, we:

  • Get records from the system.billing.usage table, and join it with the system.billing.list_prices table to get the dollar amount from the DBU.
  • Use a case statement to group the SKU types into 5 broad categories of ALL_PURPOSE, JOBS, DLT, SQL and MODEL_INFERENCE.
  • Group by date and SKU to aggregate daily usage metrics.
  • Use a date filter, which will determine the length of time for which we want to know about the daily usage.

If we want the filter to be a variable value, we can use the Query Parameters. Databricks supports static values in the query parameters, like text, number, dates, or a dropdown, but it also supports dynamic values, like the results of another query.

To use the output of a query as a parameter to another query, we will first create the below query.

 

SELECT CONCAT(start_date, ' | ', label) FROM (
 SELECT cast(current_date as date) as start_date , label, o FROM
( SELECT DATE_TRUNC('DAY', CURRENT_DATE()) as current_date, 'Current Date' as label, 1 as o
 UNION
 SELECT DATE_TRUNC('WEEK', CURRENT_DATE()) as current_date, 'Current Week' as label, 2 as o
 UNION
 SELECT DATE_TRUNC('MONTH', CURRENT_DATE()) as current_date, 'Current Month' as label, 3 as o
 UNION
 SELECT DATE_TRUNC('YEAR', CURRENT_DATE()) as current_date, 'Current Year' as label, 4 as o
 UNION
 SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -6)  as current_date, 'Last 6 Months' as label, 5 as o
 UNION
 SELECT add_months(DATE_TRUNC('MONTH', CURRENT_DATE()), -12)  as current_date, 'Last 12 Months' as label, 6 as o
 )
order by o );

 

Here, we are setting different date ranges based on the current date, and the output of this query will looks like this:

MohanaBasak_0-1715177844815.png

We will use these date ranges as input to our billing usage query. To do that, first we need to save this query. Then we can go to our usage query and select the saved query as a parameter for the start_date filter.

MohanaBasak_14-1715180588340.png

Now we can run our query and get back usage results.

MohanaBasak_4-1715177844884.png

Next, we will add a visualization to see this as a bar chart.

MohanaBasak_15-1715180927043.png

Similarly, we can create other queries and visualizations to view usage per workspace, or per custom tag. 

 

Visualization in a DBSQL Dashboard

We can put all of the queries (created above) in a dashboard, to view all the charts in one page.

MohanaBasak_7-1715177844887.png

The query parameter will appear on top of the dashboard. If we change the date range, all the charts will get refreshed to display the usage for the new time frame.

MohanaBasak_8-1715177844888.png

These dashboards can be automatically refreshed, shared with other users, and a subscription email list can be added to receive a snapshot of the dashboard at set schedules. 

Create alert to monitor usage

From the queries created above, we can set alerts which will send out notifications in case the DBU usage goes above a specified threshold.

MohanaBasak_9-1715177844748.png

The ‘Custom template’ allows us to set any custom email format that we want. We have to provide the mail body as HTML. It also lets us include some built-in template variables in our mail body. There is a Formatting Guide hyperlink just below the template which contains these details.

For example, we can set this in the email template:

 

DBU used: {{QUERY_RESULT_VALUE}}
</br></br>
{{QUERY_RESULT_TABLE}}
</br></br>
DBU threshold: {{ALERT_THRESHOLD}}
</br></br>
Please find the dashboard link <a href="https://<workspace_url>/sql/dashboards/<dashboard_id>?o=<workspace_id>">here</a>.

 

Here, the <workspace_url>, <dashboard_id>, and <workspace_id>, have to be populated with proper values.

Connect dashboard and alert using a Databricks Workflow

We can set individual refresh schedules for the dashboard and alert, or we can connect these together in a single workflow so that every time the dashboard is refreshed, the alert is also executed to check if the threshold limit is exceeded.

MohanaBasak_16-1715181319855.png

We can set a schedule to run this job daily to refresh the dashboard. If the alert threshold is exceeded, a mail will be sent to the alert subscribers.

MohanaBasak_13-1715177844885.png

This custom email will be sent to the subscribers if the alert gets triggered.

Conclusion

In this article, we learned how to use billing system tables to get insights on our DBU consumption, visualize usage in dashboards, and set alerts. We also discussed how Workflows can be used to refresh the dashboard and execute alerts to send out custom email notifications.