topic Re: Which types of model serving endpoints have health metrics available? in Machine Learning

Which types of model serving endpoints have health metrics available?

KyraHinnegan — Wed, 18 Mar 2026 02:50:55 GMT

I am retrieving a list of model serving endpoints for my workspace via this API: https://docs.databricks.com/api/workspace/servingendpoints/list
And then going to retrieve health metrics for each one with: https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics

However some of these endpoints lead to 404 errors where no metrics are available.

I have noticed that in my case these are all for endpoint type FOUNDATION_MODEL_API, but I don't have tons of examples, so I would like to know if this is always the case and exactly what are the full list of situations where an endpoint would not have metric data, based on the response from the first API call to servingendpoints/list. My use-case is to add some filtering to improve the efficiency of my script and not have to call for /metrics on any endpoint that I know would not have metrics available to query.

Re: Which types of model serving endpoints have health metrics available?

Louis_Frolio — Wed, 18 Mar 2026 09:55:24 GMT

Hey @KyraHinnegan,

I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on.

At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What you’re seeing with FOUNDATION_MODEL_API returning a 404 is expected right now.

The /api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics endpoint is the health metrics exporter. This is where you get latency, request rate, error rate, CPU, memory, GPU — all the signals you’d typically pipe into Prometheus or Datadog.

That exporter is wired up for what I’d call the “standard” Mosaic AI Model Serving endpoints. Concretely, that includes endpoints where the served entities are:

CUSTOM_MODEL
FEATURE_SPEC (feature serving)
EXTERNAL_MODEL
FOUNDATION_MODEL (provisioned throughput)

Where things diverge is with the Databricks-hosted Foundation Model API endpoints — the pay-per-token ones. These show up as FOUNDATION_MODEL_API when you call /serving-endpoints/list.

Those are backed by a separate multi-tenant system, and as of today, they don’t expose infrastructure health metrics through /metrics. So a 404 there isn’t a misconfiguration — it’s just the current behavior. What you’re seeing lines up with that.

There are also a couple of edge cases worth keeping in mind.

First, some newer foundation models (for example, Llama 4 Maverick) don’t have full metrics support yet. Even when they’re provisioned throughput endpoints, the metrics surface can be more limited than what you’d get with a typical custom model.

Second, endpoint state matters. If an endpoint isn’t in a healthy READY state — say it’s still provisioning, failed, or mid-update — /metrics can fail. In practice, you want to treat READY as a prerequisite before even attempting the call.

Given your goal — avoiding /metrics calls where they’re guaranteed to fail — the cleanest approach is to filter up front based on /serving-endpoints/list.

Only attempt /metrics when:

state.ready is true and the endpoint is not updating or being deleted, and
every served_entities[*].entity_type is one of:
- CUSTOM_MODEL
- FEATURE_SPEC
- EXTERNAL_MODEL
- FOUNDATION_MODEL (provisioned throughput)Skip /metrics when:

any served entity has entity_type = FOUNDATION_MODEL_API, or
the endpoint is not in a READY state

That gives you a deterministic way to avoid 404s while still collecting metrics everywhere they actually exist.

If you need telemetry for the Foundation Model API endpoints — things like token usage — that lives in AI Gateway usage tracking and the system.serving.endpoint_usage and system.serving.served_entities system tables, not /metrics. So it’s a different surface area depending on what you’re measuring.

To be clear, nothing is broken here. It’s just two different endpoint architectures with different observability paths, and your filter is the right way to reconcile them.

Hope this helps.

Cheers, Louis

Re: Which types of model serving endpoints have health metrics available?

johandoc — Tue, 28 Apr 2026 12:55:17 GMT

Your observation is correct—this behavior is expected.

Endpoints with entity_type = FOUNDATION_MODEL_API do not expose health metrics via the /metrics endpoint, which is why you’re getting 404 responses. These endpoints are fully managed, multi-tenant APIs (typically pay-per-token), so infrastructure-level metrics (CPU, memory, etc.) aren’t available through this API.

When /metrics is available

You’ll generally get metrics for infrastructure-backed serving endpoints, such as:

Custom model serving endpoints
Feature serving endpoints (feature specs)
External model endpoints
Foundation models with provisioned throughput

These support metrics like latency, request count, and error rates.

When /metrics is not available

You’ll typically see 404 in cases like:

entity_type = FOUNDATION_MODEL_API
Endpoint is not in a READY state
Endpoint has no active backing compute

Suggested filtering approach

To avoid unnecessary API calls:

Skip endpoints where entity_type == FOUNDATION_MODEL_API
Check that the endpoint is in a READY state before calling /metrics
Optionally inspect served_entities for more granular filtering

Alternative for FOUNDATION_MODEL_API

For these endpoints, use:

Usage tracking / billing data (token usage, request logs)
AI Gateway or system tables for observability