Databricks Model Serving provides a scalable, low-latency hosting service for AI models. It supports models ranging from small custom models to best-in-class large language models (LLMs). In this blog we’ll describe the pricing model associated with Databricks Model Serving and demonstrate how to allocate costs per endpoint or per use case.
Databricks Model Serving now includes three distinct pricing methods. Regardless of the method you choose, the price is inclusive of all cloud infrastructure costs. The three different methods are covered briefly here:
The best way to track model servings costs in Databricks is through the billable usage system table. Once enabled, the table automatically populates with the latest usage in your Databricks account. No matter which of the three model serving methods you choose, your costs will appear in the system.billing.usage table with column sku_name as either:
<tier>_SERVERLESS_REAL_TIME_INFERENCE_LAUNCH_<region>
which includes all DBUs accrued when an endpoint starts after scaling to zero. All other model serving costs are grouped under:
<tier>_SERVERLESS_REAL_TIME_INFERENCE_<region>
where tier corresponds to your Databricks platform tier and region corresponds to the cloud region of your Databricks deployment.
You can easily query the system.billing.usage table to aggregate all DBUs (Databricks Units) associated with Databricks model serving. Here is an example query that aggregates model serving DBUs per day for the last 30 days:
SELECT
SUM(usage_quantity) AS model_serving_dbus,
usage_date
FROM
system.billing.usage
WHERE
sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%'
GROUP BY(usage_date)
ORDER BY
usage_date DESC
LIMIT 30
Aggregated costs may be sufficient for simple use cases, but as the number of endpoints grows it is desirable to break out costs based on use case, business unit, or other custom identifiers. Optional key/value tags can be applied to custom models endpoints. All custom tags applied to Databricks Model Serving endpoints propagate to the system.billing.usage table under the custom_tags column and can be used to aggregate and visualize costs. Databricks recommends adding descriptive tags to each endpoint for precise cost tracking.
Below is an example query that separates model serving costs by values of a specific tag for the Databricks account over the last 30 days.
SELECT
value,
SUM(usage_quantity) AS DBUs
FROM
(
SELECT
usage_date,
usage_quantity,
-- Use the built in EXPLODE() function to create a new row per tag.
EXPLODE(custom_tags)
FROM
system.billing.usage
WHERE
sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%'
AND usage_date > DATE_SUB(CURRENT_DATE(), 30)
ORDER BY
usage_date DESC
)
WHERE
key = {{ filter_key }}
GROUP BY
value
ORDER BY
DBUs DESC
Running the query in the Databricks SQL Editor breaks out model serving costs by value of the tag over the past month:
This is just the start of what you can view and visualize using the system.billing.usage tables in Databricks! Stay tuned as Databricks plans to roll out additional tables and metrics within the system catalog.
Authors: Sean Wilkinson, Ashwin Srikant, Cathy Zdravevski
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.