cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
steven_muschler
Databricks Employee
Databricks Employee

Introduction

System tables are a powerful data source to gain rich observability into your Databricks account. By far, the most popular system tables are the system.billing tables, around usage and pricing. In this article, we share tips from subject matter experts on customizing your insights into cost reporting 

In order to accelerate the insights you can gain from the system.billing tables, Databricks provides an out-of-the-box, pre-configured AI/BI dashboard that users can import into a workspace of choice. The Usage Dashboard provides immediate insights into:

  • Total usage per workspace, SKU, billing_origin_product
  • Usage per tag key, value pair, to support cost attribution (.e.g., usage per ‘CostCenter’ tag)
  • Top N usage, per usage metadata field (e.g., top 10 job_ids)

Once imported, the Usage Dashboard is a local dashboard asset that the user is the owner of. By design, the Usage Dashboard can be cloned, modified, and shared as needed.

 👀 Also keep an eye out for the new v2 Usage Dashboard slated to arrive in the next few months as that version will come with a plethora of enhancements that further simplify how to gain insights on Databricks usage 👀  

Permissions & access to Usage Dashboard 

Today, account admins who are also metastore admins can import the Usage Dashboard, and share it widely.  Using either the Account Console UI, or Account API, admins can import a copy of the Usage Dashboard into a Unity Catalog-enabled workspace. By default, only the user who imported the Usage Dashboard will have “Can Manage” permissions to it.

Like any AI/BI Dashboard, users can share & publish the Usage Dashboard to other users in their workspace & account. Users can be granted “Can Manage”, “Can Edit”, “Can View/Can Run” permissions.

Note: A user must also have SELECT permissions to system.billing.usage and system.billing.list_prices tables, for the dashboard queries to successfully execute.

Publish Usage Dashboard with embedded credentials

To de-couple permissions to the Dashboard from permissions to the system.billing tables, admins can publish the Usage Dashboard with embedded credentials, and then share it more widely. 

When published with embedded credentials, Dashboard viewers will use the publisher’s data permissions (i.e., can use the account admin’s data permission to system.billing tables). However, they will not be granted direct access to the system.billing tables through this method. 

In this way, account admins can securely share the Usage Dashboard with workspace admins, and workspace users, to promote greater cost transparency.

Embed the Usage Dashboard in an external website or application 

For greater transparency into usage & cost metrics, Dashboard creators can even embed the Usage Dashboard in a 3rd-party site, such as Confluence or Sharepoint. This will be done largely by copying the dashboard’s embed code, and inserting the code snippet into the target site page. 

Create account-wide or workspace-scoped Usage Dashboards programmatically

Via the UI and API, account admin can specify if they want to create Usage Dashboard at account-level, or create one scoped to a specific workspace. In order to create a workspace-level Usage Dashboard in every target workspace, users can do so in batch via the usage dashboard creation API.

With this API, the principal must specify the dashboard_type (workspace-level, or account-level) as well as the workspace_id to create the Dashboard asset in.  

Factoring in Discounts & Adjusting currency

Databricks provides a copy of the Usage Dashboard that is ready for immediate use; however, once imported, users are able to edit and customize the dashboard as they see fit.  Similar to any other Databricks AI/BI Dashboard, the datasets the Usage Dashboard uses can be customized by using the Data tab.  The datasets within the Data tab ultimately power the dashboard visualizations.  One commonly done customization with the Usage Dashboard involves factoring in contract specific discounts as the dashboard, by default, uses list prices.  There are two primary approaches to accomplish this.

Static Multiplier

This scenario will only work if the customer specific discount is the same across all SKUs.  Let’s say a customer has a flat 10% discount across all SKUs as part of their contract.  If so, one could update the following portion of each dashboard query as follows:

coalesce(u.usage_quantity * p.pricing.effective_list.default, 0) as usage_usd,

To

coalesce(u.usage_quantity * p.pricing.effective_list.default * 0.9, 0) as usage_usd,

To further streamline this, one can also create a dashboard parameter so that this value can be updated on demand. 

This can be done by first adding a parameter to the respective query on the Data tab

Screenshot 2025-04-15 at 8.19.48 PM.png

and then subsequently adding a new widget for that parameter on the Canvas tab

Screenshot 2025-04-15 at 8.20.45 PM.png

Mapping Table

Another option that takes more work to set up, but can handle variable discounts by SKU and by time is to create a mapping table.

Example Table Creation

CREATE OR REPLACE TABLE discounts AS
SELECT sku_name, usage_unit, CAST('2025-01-01T00:00:00.000' AS TIMESTAMP) as contract_start_time, CAST(null AS TIMESTAMP) as contract_end_time, 0.0 as discount
FROM system.billing.list_prices

Note that the discount percentages need to be updated manually in the table created.

Once the table has been created, the dashboard queries can be updated to factor in discounts, if any.  A common table expression for discount_pricing needs to be added and the list_priced_usd subquery needs to be updated to factor in the discount_pricing data.

Update Dashboard Queries

discount_pricing as (
 select coalesce(contract_end_time, date_add(current_date, 1)) as coalesced_contract_end_time, *
 from discounts
),
list_priced_usd as (
 select
   coalesce(u.usage_quantity * p.pricing.effective_list.default * coalesce((1 - d.discount), 1), 0) as usage_usd,
   d.*,
   date_trunc('QUARTER', usage_date) as usage_quarter,
   date_trunc('MONTH', usage_date) as usage_month,
   date_trunc('WEEK', usage_date) as usage_week,
   u.*
 from usage_filtered as u
 left join prices as p
   on u.sku_name=p.sku_name
   and u.usage_unit=p.usage_unit
   and (u.usage_end_time between p.price_start_time and p.coalesced_price_end_time)
 left join discount_pricing as d
   on u.sku_name=d.sku_name
   and u.usage_unit=d.usage_unit
   and (u.usage_end_time between d.contract_start_time and d.coalesced_contract_end_time)
),

Either of the above options can also be used for converting the dashboard into a different currency.  The static option could be used to apply a consistent exchange rate to get a general estimate of cost in another currency.  The mapping table could be used to apply exchange rates on a more frequent basis, such as daily.  This would result in a more accurate conversion.  

Although converting currencies or applying discounts requires additional work, please keep an eye out for new product features to continue to make this experience easier and easier over time.  For AWS and GCP customers, keep an eye out for a contract pricing system table as a new feature that will streamline how to factor in discounts.  For Azure Databricks customers, an alternate approach to factoring in discounts would be to import Azure billing data into Databricks as discounts are included in that data.  Additionally, please reach out to your account team or via the feedback portal to provide feedback on exactly what enhancements you would like to see baked into the product regarding the usage dashboard

<OTHERS> and null values

Towards the bottom of the dashboard is a section called Usage Analysis: Top Spending.  This section is excellent for finding the top N of most costly resources over a time period.  For example, one could use this to find the top 10 most costly clusters.  The following screenshot illustrates this scenario.

Screenshot 2025-04-09 at 8.50.27 PM.png

What can be confusing in this section is that there are large amounts of usage that either null or <OTHERS>.  <OTHERS> in this case ($27.95K on 2/24/25) represents the total amount of cost for resources with a cluster id, excluding the top 10 most expensive as those are split out individually.  null represents the spend that cannot be associated with a cluster_id.  In this case, on 2/24/25, $136.61K of spend was incurred by resources without a cluster_id.  Some example SKUs that will not have a cluster_id are PREMIUM_SQL_COMPUTE and PREMIUM_JOBS_SERVERLESS_COMPUTE_US_EAST_2.  In case a user does not want to look at null, there is a dropdown that can be toggled to exclude spend associated with a null value.

Conclusion

Understanding Databricks usage is paramount to success to ensure that the value generated relative to cost spent meets the desired ROI.  As shown, the Usage Dashboard is a fantastic, and customizable, way to start answering these questions.  Additionally, keep an eye out for additional announcements and product developments as Databricks continues to innovate, and simplify cost management across the platform.