cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Which types of model serving endpoints have health metrics available?

KyraHinnegan
New Contributor II

I am retrieving a list of model serving endpoints for my workspace via this API: https://docs.databricks.com/api/workspace/servingendpoints/list
And then going to retrieve health metrics for each one with: https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics

However some of these endpoints lead to 404 errors where no metrics are available. 

I have noticed that in my case these are all for endpoint type FOUNDATION_MODEL_API, but I don't have tons of examples, so I would like to know if this is always the case and exactly what are the full list of situations where an endpoint would not have metric data, based on the response from the first API call to servingendpoints/list. My use-case is to add some filtering to improve the efficiency of my script and not have to call for /metrics on any endpoint that I know would not have metrics available to query.

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @KyraHinnegan,

I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on.

At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What you’re seeing with FOUNDATION_MODEL_API returning a 404 is expected right now.

The /api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics endpoint is the health metrics exporter. This is where you get latency, request rate, error rate, CPU, memory, GPU — all the signals you’d typically pipe into Prometheus or Datadog.

That exporter is wired up for what I’d call the “standard” Mosaic AI Model Serving endpoints. Concretely, that includes endpoints where the served entities are:

  • CUSTOM_MODEL

  • FEATURE_SPEC (feature serving)

  • EXTERNAL_MODEL

  • FOUNDATION_MODEL (provisioned throughput)

Where things diverge is with the Databricks-hosted Foundation Model API endpoints — the pay-per-token ones. These show up as FOUNDATION_MODEL_API when you call /serving-endpoints/list.

Those are backed by a separate multi-tenant system, and as of today, they don’t expose infrastructure health metrics through /metrics. So a 404 there isn’t a misconfiguration — it’s just the current behavior. What you’re seeing lines up with that.

There are also a couple of edge cases worth keeping in mind.

First, some newer foundation models (for example, Llama 4 Maverick) don’t have full metrics support yet. Even when they’re provisioned throughput endpoints, the metrics surface can be more limited than what you’d get with a typical custom model.

Second, endpoint state matters. If an endpoint isn’t in a healthy READY state — say it’s still provisioning, failed, or mid-update — /metrics can fail. In practice, you want to treat READY as a prerequisite before even attempting the call.

Given your goal — avoiding /metrics calls where they’re guaranteed to fail — the cleanest approach is to filter up front based on /serving-endpoints/list.

Only attempt /metrics when:

  • state.ready is true and the endpoint is not updating or being deleted, and

  • every served_entities[*].entity_type is one of:

    • CUSTOM_MODEL

    • FEATURE_SPEC

    • EXTERNAL_MODEL

    • FOUNDATION_MODEL (provisioned throughput)Skip /metrics when:

     

  • any served entity has entity_type = FOUNDATION_MODEL_API, or

  • the endpoint is not in a READY state

That gives you a deterministic way to avoid 404s while still collecting metrics everywhere they actually exist.

If you need telemetry for the Foundation Model API endpoints — things like token usage — that lives in AI Gateway usage tracking and the system.serving.endpoint_usage and system.serving.served_entities system tables, not /metrics. So it’s a different surface area depending on what you’re measuring.

To be clear, nothing is broken here. It’s just two different endpoint architectures with different observability paths, and your filter is the right way to reconcile them.

Hope this helps.

Cheers, Louis