Model Serving Latency Chart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-30-2024 12:07 PM
Hi,
For the model serving latency graph what is p50 and p99? I only have one model i am serving on this endpoing so im surprised to see two models being tracked
- Labels:
-
Feature Store
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-30-2024 12:09 PM - edited 04-30-2024 12:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2024 09:37 AM
@Kaizen - Please refer to the below explanation.
In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.

