In a recent engagement, I partnered with a customer who had successfully productionalized sophisticated AI use cases using Genie Spaces, Multi-Agent Supervisor systems, and managed MCP servers. Their innovation velocity was impressive.
As part of a workspace review, we identified an opportunity to take action on over $150,000 in annualized cost savings through improved Model Serving hygiene.
While Scale-to-Zero guidance is clearly documented (Section 2.vii), identifying steady-state serving patterns directly through the UI can be less intuitive at scale. By leveraging system-level telemetry, we were able to bring clarity to usage behavior that wasn’t immediately obvious in the Serving tab.
In this workspace, several Model Serving Endpoints displayed as “Not Ready” in the UI, yet underlying compute remained active in a consistent pattern. Optimizing these endpoints resulted in approximately $9,000 per month in cost reduction, without impacting production workloads.
“Steady-state usage” can be understood as endpoints that look perfectly flat over time. If compute consumption doesn’t move at all, down to the decimal, for four straight weeks, that’s usually a sign the workload isn’t behaving dynamically. It doesn’t mean something’s wrong, but it does mean it’s worth a quick optimization check.
For analytical purposes, we defined a measurable signal of optimization opportunity:
Four consecutive weeks of identical usage (to the second decimal place).
This pattern can reasonably occur in three scenarios:
It’s important to note:
Not all steady-state usage is inappropriate.
Some production workloads are intentionally configured to avoid scale latency. The goal is not automatic pausing, it’s visibility and intentionality.
Optimization is about ensuring resources reflect business intent.
An internal Logfood Dashboard Steady-State Model Serving Endpoint Usage enables filtering by Account to review Model Serving consumption across Workspaces.
Optimization Signal:
Identical weekly DBU usage across multiple consecutive weeks.
Expected Healthy Pattern:
Natural variation in weekly usage reflecting real demand.
Workspace Admins can independently perform endpoint-level analysis using System Tables:
SELECT
u.workspace_id,
custom_tags.EndpointId,
usage_metadata.endpoint_name,
u.billing_origin_product,
date_format(usage_start_time, 'yyyy-MM-dd') as usage_week,
sum(u.usage_quantity) as usage_quantity
FROM system.billing.usage u
WHERE u.usage_date BETWEEN dateadd(DAY,-365,current_date()) AND current_date()
AND billing_origin_product in ('MODEL_SERVING', 'VECTOR_SEARCH')
GROUP BY 1,2,3,4,5
ORDER BY usage_metadata.endpoint_name desc, usage_week desc
Optimization Signal:
Consistent, identical DBU consumption across multiple weeks for a given endpoint.
Expected Behavior:
Small to large natural variance in week-to-week usage.
The dashboard and query provide the observability layer. From there:
The key step is alignment, not automation. Some endpoints are intentionally persistent. Others represent experimentation or retired use cases that can now be right-sized.
Even highly sophisticated teams benefit from structured workspace hygiene reviews. As AI innovation accelerates, experimentation naturally expands footprint.
By establishing clear signals for steady-state usage and leveraging System Tables for observability, organizations can:
What began as a simple review resulted in six-figure annual savings, not through reduction, but through refinement.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.