cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
sewi_ml
New Contributor III
New Contributor III

Authors: Sean Wilkinson, Ashwin Srikant, Cathy Zdravevski

Databricks Model Serving provides a scalable, low-latency hosting service for AI models. It supports models ranging from small custom models to best-in-class large language models (LLMs). In this blog we’ll describe the pricing model associated with Databricks Model Serving and demonstrate how to allocate costs per endpoint or per use case.

Understanding billable usage

Databricks Model Serving now includes three distinct pricing methods. Regardless of the method you choose, the price is inclusive of all cloud infrastructure costs. The three different methods are covered briefly here:

  • Model and Feature Serving: Choose a compute type (CPU/GPU) and a size that corresponds to a range of concurrent requests that the endpoint can handle. The serverless endpoint will scale seamlessly within this range and you pay for the actual compute allocated. If “scale to zero” is enabled, a $.07 charge per launch is incurred (max 2/hour).
  • Foundation Model APIs (Provisioned Throughput): For large language model use cases that require consistent, low latency responses with high concurrency. It provides dedicated compute that scales between a configured set range. Databricks only charges for the actual compute used.
  • Foundation Model APIs (Pay-per-Token): Choose one of the available state-of-the-art Foundation Models and query it directly. Customers pay only for the input and output tokens consumed and produced by the model.

The best way to track model servings costs in Databricks is through the billable usage system table. Once enabled, the table automatically populates with the latest usage in your Databricks account. No matter which of the three model serving methods you choose, your costs will appear in the system.billing.usage table with column sku_name as either:

<tier>_SERVERLESS_REAL_TIME_INFERENCE_LAUNCH_<region>

which includes all DBUs accrued when an endpoint starts after scaling to zero. All other model serving costs are grouped under: 

<tier>_SERVERLESS_REAL_TIME_INFERENCE_<region>

where tier corresponds to your Databricks platform tier and region corresponds to the cloud region of your Databricks deployment. 

Querying and visualizing Model Serving usage

You can easily query the system.billing.usage table to aggregate all DBUs (Databricks Units) associated with Databricks model serving. Here is an example query that aggregates model serving DBUs per day for the last 30 days:

SELECT

 SUM(usage_quantity) AS model_serving_dbus,

 usage_date

FROM

 system.billing.usage

WHERE

 sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%'

GROUP BY(usage_date)

ORDER BY

 usage_date DESC

LIMIT 30

Model Serving DBUs per Day For Last 30 DaysModel Serving DBUs per Day For Last 30 Days

Cost attribution with custom tags

Aggregated costs may be sufficient for simple use cases, but as the number of endpoints grows it is desirable to break out costs based on use case, business unit, or other custom identifiers. Optional key/value tags can be applied to custom models endpoints. All custom tags applied to Databricks Model Serving endpoints propagate to the system.billing.usage table under the custom_tags column and can be used to aggregate and visualize costs. Databricks recommends adding descriptive tags to each endpoint for precise cost tracking.

Applying Custom TagsApplying Custom Tags

Below is an example query that separates model serving costs by values of a specific tag for the Databricks account over the last 30 days.

SELECT

 value,

 SUM(usage_quantity) AS DBUs

FROM

 (

   SELECT

     usage_date,

     usage_quantity,

     -- Use the built in EXPLODE() function to create a new row per tag.

     EXPLODE(custom_tags)

   FROM

     system.billing.usage

   WHERE

     sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%'

     AND usage_date > DATE_SUB(CURRENT_DATE(), 30)

   ORDER BY

     usage_date DESC

 )

WHERE

 key = {{ filter_key }}

GROUP BY

 value

ORDER BY

 DBUs DESC

Running the query in the Databricks SQL Editor breaks out model serving costs by value of the tag over the past month:

DBUs by Endpoint Owner Over Past 30 DaysDBUs by Endpoint Owner Over Past 30 Days


Conclusion

This is just the start of what you can view and visualize using the system.billing.usage tables in Databricks! Stay tuned as Databricks plans to roll out additional tables and metrics within the system catalog.