cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
steven_muschler
Databricks Employee
Databricks Employee

Introduction

As organizations scale their usage of cloud SaaS/PaaS/IaaS solutions, such as Databricks, it is increasingly important to ensure an appropriate understanding of the costs incurred.  Cloud technology has flipped the script from traditional on-premise infrastructure as bills now are incurred after usage instead of prior to the purchase of something like a server rack.  This model enables greater flexibility and agility for businesses, but also can present unexpected surprises at the end of the month that were not anticipated, such as being charged for a cluster over the weekend that did not have auto-termination enabled.  Databricks provides a variety of mechanisms for cost observability and management, but two issues I have seen many times with customers to date are misinterpreting the list prices system table and incorrectly joining the usage system table with the list prices system table.  Those mistakes can result in wildly inaccurate cost data.  This post seeks to demystify and help others understand some of these key nuances.

Are List Prices my prices?

The short answer to the question is… maybe. List prices includes pricing data for every Databricks SKU, but it is extremely important to note that the prices are the sticker price listed on the website and not necessarily the price a customer is paying. Many organizations have contracts with Databricks that result in different SKU prices. For example, let’s say an organization has a 10% discount for all SKUs as part of their contract. If a user uses the list price ($0.39) to calculate the cost of 100 DBUs for the SKU PREMIUM_JOBS_SERVERLESS_COMPUTE_EUROPE_FRANKFURT, then the cost would be $39. With a 10% discount, the actual price the organization pays per DBU is $0.351 which results in a total cost of $35.10, not $39. What should also be noted is that list prices do not include other variables that may appear on the final invoice, such as add-ons like Enhanced Security and Compliance or taxes.

But wait… there’s more

Occasionally, Databricks will run a SKU specific discount for all customers. For example, there is currently, as of January 2025, a 50% discount on Serverless Jobs and that discount does show in the list prices table if you look under promotional and effective list price, one can see a price of $0.195. In this case, the discount for all customers and a customer specific discount will compound. If we start with the effective list price of $0.195 and then multiply that by 0.9 for the customer specific discount, then we are left with $0.17755 as the cost per DBU for this SKU. Applying this to 100 DBUs, we then get a total cost of $17.55.

Screenshot 2024-12-28 at 1.57.22 PM.pngWe generally recommend using pricing.effective_list.default instead of pricing.default to ensure customer-wide promotions, such as the 50% discount are factored in. Additionally, the Databricks Usage Dashboard leverages the effective_list.default price. In some scenarios, it might make sense to use pricing.default to project future estimates as those will not include customer-wide, temporary discounts.

Beware of Incorrect Joins

Costs are typically calculated by taking the number of DBUs from the usage table and then multiplying them by the SKU cost in the list prices table.  If one is not careful with the join logic, they may end up with wildly incorrect costs.  Each SKU can have more than one record as SKUs have effective start and end dates (the current price has an end date of null) as prices can change over time.  If one does not take that into account when joining, then the usage message will be duplicated and greatly inflate the data.

Screenshot 2024-12-28 at 1.57.41 PM.png

The following query is an example of a join that makes this mistake.  It only factors in the SKU name in the join condition and therefore will duplicate the usage records for each record in the list prices table.

Do Not Use This Join!

SELECT usage.sku_name, usage.usage_date, SUM(usage.usage_quantity * list_prices.pricing.effective_list.default) AS total_list_cost
FROM system.billing.usage usage
JOIN system.billing.list_prices list_prices ON list_prices.sku_name = usage.sku_name
GROUP BY usage.sku_name, usage.usage_date

The following query is an example of the correct join which factors in the effective start date and end date.  Always make sure to include the correct join when joining the list prices and usage tables.

Use This Join Instead!

SELECT usage.sku_name, usage.usage_date, SUM(usage.usage_quantity * list_prices.pricing.effective_list.default) AS total_list_cost
FROM system.billing.usage
JOIN system.billing.list_prices ON list_prices.sku_name = usage.sku_name
 WHERE usage.usage_end_time >= list_prices.price_start_time
 AND (list_prices.price_end_time IS NULL OR usage.usage_end_time < list_prices.price_end_time)
GROUP BY usage.sku_name, usage.usage_date

As a general reminder, if you ever see a price listed in Databricks, double check the documentation to see if the price is based on list prices or not.  When in doubt, check with your Databricks account team.

Additional Resources