<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Which types of model serving endpoints have health metrics available? in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/155682#M4613</link>
    <description>&lt;P&gt;Your observation is correct—this behavior is expected.&lt;/P&gt;&lt;P&gt;Endpoints with entity_type = FOUNDATION_MODEL_API do &lt;STRONG&gt;not expose health metrics via the /metrics endpoint&lt;/STRONG&gt;, which is why you’re getting 404 responses. These endpoints are fully managed, multi-tenant APIs (typically pay-per-token), so infrastructure-level metrics (CPU, memory, etc.) aren’t available through this API.&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;When /metrics is available&lt;/SPAN&gt;&lt;/H3&gt;&lt;P&gt;You’ll generally get metrics for &lt;STRONG&gt;infrastructure-backed serving endpoints&lt;/STRONG&gt;, such as:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Custom model serving endpoints&lt;/LI&gt;&lt;LI&gt;Feature serving endpoints (feature specs)&lt;/LI&gt;&lt;LI&gt;External model endpoints&lt;/LI&gt;&lt;LI&gt;Foundation models with &lt;STRONG&gt;provisioned throughput&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These support metrics like latency, request count, and error rates.&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;When /metrics is not available&lt;/SPAN&gt;&lt;/H3&gt;&lt;P&gt;You’ll typically see 404 in cases like:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;entity_type = FOUNDATION_MODEL_API&lt;/LI&gt;&lt;LI&gt;Endpoint is not in a &lt;STRONG&gt;READY&lt;/STRONG&gt; state&lt;/LI&gt;&lt;LI&gt;Endpoint has no active backing compute&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Suggested filtering approach&lt;/H3&gt;&lt;P&gt;To avoid unnecessary API calls:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Skip endpoints where entity_type == FOUNDATION_MODEL_API&lt;/LI&gt;&lt;LI&gt;Check that the endpoint is in a &lt;STRONG&gt;READY&lt;/STRONG&gt; state before calling /metrics&lt;/LI&gt;&lt;LI&gt;Optionally inspect served_entities for more granular filtering&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Alternative for FOUNDATION_MODEL_API&lt;/H3&gt;&lt;P&gt;For these endpoints, use:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Usage tracking / billing data (token usage, request logs)&lt;/LI&gt;&lt;LI&gt;AI Gateway or system tables for observability&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Tue, 28 Apr 2026 12:55:17 GMT</pubDate>
    <dc:creator>johandoc</dc:creator>
    <dc:date>2026-04-28T12:55:17Z</dc:date>
    <item>
      <title>Which types of model serving endpoints have health metrics available?</title>
      <link>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/151196#M4585</link>
      <description>&lt;P&gt;I am retrieving a list of model serving endpoints for my workspace via this API:&amp;nbsp;&lt;A href="https://docs.databricks.com/api/workspace/servingendpoints/list" target="_blank" rel="noopener"&gt;https://docs.databricks.com/api/workspace/servingendpoints/list&lt;/A&gt;&lt;BR /&gt;And then going to retrieve health metrics for each one with:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/machine-learning/model-serving/metrics-export-serving-endpoint" target="_self"&gt;&lt;SPAN class=""&gt;https://&lt;/SPAN&gt;&lt;SPAN class=""&gt;[&lt;/SPAN&gt;&lt;SPAN class=""&gt;DATABRICKS_HOST&lt;/SPAN&gt;&lt;SPAN class=""&gt;]&lt;/SPAN&gt;&lt;SPAN class=""&gt;/api/2.0/serving-endpoints/&lt;/SPAN&gt;&lt;SPAN class=""&gt;[&lt;/SPAN&gt;&lt;SPAN class=""&gt;ENDPOINT_NAME&lt;/SPAN&gt;&lt;SPAN class=""&gt;]&lt;/SPAN&gt;&lt;SPAN class=""&gt;/metrics&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;However some of these endpoints lead to 404 errors where no metrics are available.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;I have noticed that in my case these are all for endpoint type &lt;/SPAN&gt;&lt;SPAN&gt;FOUNDATION_MODEL_API, but I don't have tons of examples, so I would like to know if this is always the case and exactly what are the full list of situations where an endpoint would not have metric data, based on the response from the first API call to&amp;nbsp;&lt;A href="https://docs.databricks.com/api/workspace/servingendpoints/list" target="_blank" rel="noopener"&gt;servingendpoints/list&lt;/A&gt;. My use-case is to add some filtering to improve the efficiency of my script and not have to call for /metrics on any endpoint that I know would not have metrics available to query.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 02:50:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/151196#M4585</guid>
      <dc:creator>KyraHinnegan</dc:creator>
      <dc:date>2026-03-18T02:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: Which types of model serving endpoints have health metrics available?</title>
      <link>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/151228#M4586</link>
      <description>&lt;P class="p1"&gt;Hey &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/202721"&gt;@KyraHinnegan&lt;/a&gt;,&lt;/P&gt;
&lt;P class="p1"&gt;I did some digging and here is what I found. Hopefully it helps you understand a bit more about what is going on.&lt;/P&gt;
&lt;P class="p1"&gt;At a high level, not every endpoint type exposes infrastructure health metrics via &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt;. What you’re seeing with &lt;SPAN class="s1"&gt;FOUNDATION_MODEL_API&lt;/SPAN&gt; returning a 404 is expected right now.&lt;/P&gt;
&lt;P class="p1"&gt;The &lt;SPAN class="s1"&gt;/api/2.0/serving-endpoints/[ENDPOINT_NAME]/metrics&lt;/SPAN&gt; endpoint is the health metrics exporter. This is where you get latency, request rate, error rate, CPU, memory, GPU — all the signals you’d typically pipe into Prometheus or Datadog.&lt;/P&gt;
&lt;P class="p1"&gt;That exporter is wired up for what I’d call the “standard” Mosaic AI Model Serving endpoints. Concretely, that includes endpoints where the served entities are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;CUSTOM_MODEL&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;FEATURE_SPEC&lt;/SPAN&gt; (feature serving)&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;EXTERNAL_MODEL&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;FOUNDATION_MODEL&lt;/SPAN&gt; (provisioned throughput)&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;Where things diverge is with the Databricks-hosted Foundation Model API endpoints — the pay-per-token ones. These show up as &lt;SPAN class="s1"&gt;FOUNDATION_MODEL_API&lt;/SPAN&gt; when you call &lt;SPAN class="s1"&gt;/serving-endpoints/list&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P class="p1"&gt;Those are backed by a separate multi-tenant system, and as of today, they don’t expose infrastructure health metrics through &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt;. So a 404 there isn’t a misconfiguration — it’s just the current behavior. What you’re seeing lines up with that.&lt;/P&gt;
&lt;P class="p1"&gt;There are also a couple of edge cases worth keeping in mind.&lt;/P&gt;
&lt;P class="p1"&gt;First, some newer foundation models (for example, Llama 4 Maverick) don’t have full metrics support yet. Even when they’re provisioned throughput endpoints, the metrics surface can be more limited than what you’d get with a typical custom model.&lt;/P&gt;
&lt;P class="p1"&gt;Second, endpoint state matters. If an endpoint isn’t in a healthy &lt;SPAN class="s1"&gt;READY&lt;/SPAN&gt; state — say it’s still provisioning, failed, or mid-update — &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt; can fail. In practice, you want to treat &lt;SPAN class="s1"&gt;READY&lt;/SPAN&gt; as a prerequisite before even attempting the call.&lt;/P&gt;
&lt;P class="p1"&gt;Given your goal — avoiding &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt; calls where they’re guaranteed to fail — the cleanest approach is to filter up front based on &lt;SPAN class="s1"&gt;/serving-endpoints/list&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P class="p1"&gt;Only attempt &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt; when:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;state.ready&lt;/SPAN&gt; is true and the endpoint is not updating or being deleted, and&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;every &lt;/SPAN&gt;served_entities[*].entity_type&lt;SPAN class="s1"&gt; is one of:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;CUSTOM_MODEL&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;FEATURE_SPEC&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;EXTERNAL_MODEL&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;FOUNDATION_MODEL&lt;/SPAN&gt; (provisioned throughput)&lt;SPAN&gt;Skip &lt;/SPAN&gt;&lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt;&lt;SPAN&gt; when:&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;any served entity has &lt;/SPAN&gt;entity_type = FOUNDATION_MODEL_API&lt;SPAN class="s1"&gt;, or&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;the endpoint is not in a &lt;SPAN class="s1"&gt;READY&lt;/SPAN&gt; state&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;That gives you a deterministic way to avoid 404s while still collecting metrics everywhere they actually exist.&lt;/P&gt;
&lt;P class="p1"&gt;If you need telemetry for the Foundation Model API endpoints — things like token usage — that lives in AI Gateway usage tracking and the &lt;SPAN class="s1"&gt;system.serving.endpoint_usage&lt;/SPAN&gt; and &lt;SPAN class="s1"&gt;system.serving.served_entities&lt;/SPAN&gt; system tables, not &lt;SPAN class="s1"&gt;/metrics&lt;/SPAN&gt;. So it’s a different surface area depending on what you’re measuring.&lt;/P&gt;
&lt;P class="p1"&gt;To be clear, nothing is broken here. It’s just two different endpoint architectures with different observability paths, and your filter is the right way to reconcile them.&lt;/P&gt;
&lt;P class="p2"&gt;Hope this helps.&lt;/P&gt;
&lt;P class="p1"&gt;Cheers, Louis&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 09:55:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/151228#M4586</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-03-18T09:55:24Z</dc:date>
    </item>
    <item>
      <title>Re: Which types of model serving endpoints have health metrics available?</title>
      <link>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/155682#M4613</link>
      <description>&lt;P&gt;Your observation is correct—this behavior is expected.&lt;/P&gt;&lt;P&gt;Endpoints with entity_type = FOUNDATION_MODEL_API do &lt;STRONG&gt;not expose health metrics via the /metrics endpoint&lt;/STRONG&gt;, which is why you’re getting 404 responses. These endpoints are fully managed, multi-tenant APIs (typically pay-per-token), so infrastructure-level metrics (CPU, memory, etc.) aren’t available through this API.&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;When /metrics is available&lt;/SPAN&gt;&lt;/H3&gt;&lt;P&gt;You’ll generally get metrics for &lt;STRONG&gt;infrastructure-backed serving endpoints&lt;/STRONG&gt;, such as:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Custom model serving endpoints&lt;/LI&gt;&lt;LI&gt;Feature serving endpoints (feature specs)&lt;/LI&gt;&lt;LI&gt;External model endpoints&lt;/LI&gt;&lt;LI&gt;Foundation models with &lt;STRONG&gt;provisioned throughput&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These support metrics like latency, request count, and error rates.&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;When /metrics is not available&lt;/SPAN&gt;&lt;/H3&gt;&lt;P&gt;You’ll typically see 404 in cases like:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;entity_type = FOUNDATION_MODEL_API&lt;/LI&gt;&lt;LI&gt;Endpoint is not in a &lt;STRONG&gt;READY&lt;/STRONG&gt; state&lt;/LI&gt;&lt;LI&gt;Endpoint has no active backing compute&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Suggested filtering approach&lt;/H3&gt;&lt;P&gt;To avoid unnecessary API calls:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Skip endpoints where entity_type == FOUNDATION_MODEL_API&lt;/LI&gt;&lt;LI&gt;Check that the endpoint is in a &lt;STRONG&gt;READY&lt;/STRONG&gt; state before calling /metrics&lt;/LI&gt;&lt;LI&gt;Optionally inspect served_entities for more granular filtering&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Alternative for FOUNDATION_MODEL_API&lt;/H3&gt;&lt;P&gt;For these endpoints, use:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Usage tracking / billing data (token usage, request logs)&lt;/LI&gt;&lt;LI&gt;AI Gateway or system tables for observability&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 28 Apr 2026 12:55:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/which-types-of-model-serving-endpoints-have-health-metrics/m-p/155682#M4613</guid>
      <dc:creator>johandoc</dc:creator>
      <dc:date>2026-04-28T12:55:17Z</dc:date>
    </item>
  </channel>
</rss>

