Re: Model Serving Latency Chart

Kaizen · ‎04-30-2024

Hi,

For the model serving latency graph what is p50 and p99? I only have one model i am serving on this endpoing so im surprised to see two models being tracked

Kaizen · ‎04-30-2024

If im not mistaken this refers to 50% of responses and 99% responses and averages accordingly for the metrics?

@s_park
@Sujitha
@Debayan

shan_chandra · ‎05-07-2024

@Kaizen - Please refer to the below explanation.

In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.

Reference: https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html#s...