โ03-17-2026 07:50 PM
I am retrieving a list of model serving endpoints for my workspace via this API: https://docs.databricks.com/api/workspace/servingendpoints/list
And then going to retrieve health metrics for each one with: https://[DATABRICKS_HOST]/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics
However some of these endpoints lead to 404 errors where no metrics are available.
I have noticed that in my case these are all for endpoint type FOUNDATION_MODEL_API, but I don't have tons of examples, so I would like to know if this is always the case and exactly what are the full list of situations where an endpoint would not have metric data, based on the response from the first API call to servingendpoints/list. My use-case is to add some filtering to improve the efficiency of my script and not have to call for /metrics on any endpoint that I know would not have metrics available to query.
โ03-18-2026 02:55 AM
Hey @KyraHinnegan,
I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on.
At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What youโre seeing with FOUNDATION_MODEL_API returning a 404 is expected right now.
The /api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics endpoint is the health metrics exporter. This is where you get latency, request rate, error rate, CPU, memory, GPU โ all the signals youโd typically pipe into Prometheus or Datadog.
That exporter is wired up for what Iโd call the โstandardโ Mosaic AI Model Serving endpoints. Concretely, that includes endpoints where the served entities are:
CUSTOM_MODEL
FEATURE_SPEC (feature serving)
EXTERNAL_MODEL
FOUNDATION_MODEL (provisioned throughput)
Where things diverge is with the Databricks-hosted Foundation Model API endpoints โ the pay-per-token ones. These show up as FOUNDATION_MODEL_API when you call /serving-endpoints/list.
Those are backed by a separate multi-tenant system, and as of today, they donโt expose infrastructure health metrics through /metrics. So a 404 there isnโt a misconfiguration โ itโs just the current behavior. What youโre seeing lines up with that.
There are also a couple of edge cases worth keeping in mind.
First, some newer foundation models (for example, Llama 4 Maverick) donโt have full metrics support yet. Even when theyโre provisioned throughput endpoints, the metrics surface can be more limited than what youโd get with a typical custom model.
Second, endpoint state matters. If an endpoint isnโt in a healthy READY state โ say itโs still provisioning, failed, or mid-update โ /metrics can fail. In practice, you want to treat READY as a prerequisite before even attempting the call.
Given your goal โ avoiding /metrics calls where theyโre guaranteed to fail โ the cleanest approach is to filter up front based on /serving-endpoints/list.
Only attempt /metrics when:
state.ready is true and the endpoint is not updating or being deleted, and
every served_entities[*].entity_type is one of:
CUSTOM_MODEL
FEATURE_SPEC
EXTERNAL_MODEL
FOUNDATION_MODEL (provisioned throughput)Skip /metrics when:
any served entity has entity_type = FOUNDATION_MODEL_API, or
the endpoint is not in a READY state
That gives you a deterministic way to avoid 404s while still collecting metrics everywhere they actually exist.
If you need telemetry for the Foundation Model API endpoints โ things like token usage โ that lives in AI Gateway usage tracking and the system.serving.endpoint_usage and system.serving.served_entities system tables, not /metrics. So itโs a different surface area depending on what youโre measuring.
To be clear, nothing is broken here. Itโs just two different endpoint architectures with different observability paths, and your filter is the right way to reconcile them.
Hope this helps.
Cheers, Louis
โ03-18-2026 02:55 AM
Hey @KyraHinnegan,
I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on.
At a high level, not every endpoint type exposes infrastructure health metrics via /metrics. What youโre seeing with FOUNDATION_MODEL_API returning a 404 is expected right now.
The /api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics endpoint is the health metrics exporter. This is where you get latency, request rate, error rate, CPU, memory, GPU โ all the signals youโd typically pipe into Prometheus or Datadog.
That exporter is wired up for what Iโd call the โstandardโ Mosaic AI Model Serving endpoints. Concretely, that includes endpoints where the served entities are:
CUSTOM_MODEL
FEATURE_SPEC (feature serving)
EXTERNAL_MODEL
FOUNDATION_MODEL (provisioned throughput)
Where things diverge is with the Databricks-hosted Foundation Model API endpoints โ the pay-per-token ones. These show up as FOUNDATION_MODEL_API when you call /serving-endpoints/list.
Those are backed by a separate multi-tenant system, and as of today, they donโt expose infrastructure health metrics through /metrics. So a 404 there isnโt a misconfiguration โ itโs just the current behavior. What youโre seeing lines up with that.
There are also a couple of edge cases worth keeping in mind.
First, some newer foundation models (for example, Llama 4 Maverick) donโt have full metrics support yet. Even when theyโre provisioned throughput endpoints, the metrics surface can be more limited than what youโd get with a typical custom model.
Second, endpoint state matters. If an endpoint isnโt in a healthy READY state โ say itโs still provisioning, failed, or mid-update โ /metrics can fail. In practice, you want to treat READY as a prerequisite before even attempting the call.
Given your goal โ avoiding /metrics calls where theyโre guaranteed to fail โ the cleanest approach is to filter up front based on /serving-endpoints/list.
Only attempt /metrics when:
state.ready is true and the endpoint is not updating or being deleted, and
every served_entities[*].entity_type is one of:
CUSTOM_MODEL
FEATURE_SPEC
EXTERNAL_MODEL
FOUNDATION_MODEL (provisioned throughput)Skip /metrics when:
any served entity has entity_type = FOUNDATION_MODEL_API, or
the endpoint is not in a READY state
That gives you a deterministic way to avoid 404s while still collecting metrics everywhere they actually exist.
If you need telemetry for the Foundation Model API endpoints โ things like token usage โ that lives in AI Gateway usage tracking and the system.serving.endpoint_usage and system.serving.served_entities system tables, not /metrics. So itโs a different surface area depending on what youโre measuring.
To be clear, nothing is broken here. Itโs just two different endpoint architectures with different observability paths, and your filter is the right way to reconcile them.
Hope this helps.
Cheers, Louis
yesterday
Your observation is correctโthis behavior is expected.
Endpoints with entity_type = FOUNDATION_MODEL_API do not expose health metrics via the /metrics endpoint, which is why youโre getting 404 responses. These endpoints are fully managed, multi-tenant APIs (typically pay-per-token), so infrastructure-level metrics (CPU, memory, etc.) arenโt available through this API.
Youโll generally get metrics for infrastructure-backed serving endpoints, such as:
These support metrics like latency, request count, and error rates.
Youโll typically see 404 in cases like:
To avoid unnecessary API calls:
For these endpoints, use: