Model Serving Latency Chart

Kaizen — Tue, 30 Apr 2024 19:07:50 GMT

Hi,

For the model serving latency graph what is p50 and p99? I only have one model i am serving on this endpoing so im surprised to see two models being tracked

Re: Model Serving Latency Chart

Kaizen — Tue, 30 Apr 2024 19:11:09 GMT

If im not mistaken this refers to 50% of responses and 99% responses and averages accordingly for the metrics?

@s_park
@Sujitha
@Debayan

Re: Model Serving Latency Chart

shan_chandra — Tue, 07 May 2024 16:37:37 GMT

@Kaizen - Please refer to the below explanation.

In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.

Reference: https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html#serving-endpoint-metrics-definitions

topic Re: Model Serving Latency Chart in Machine Learning

Model Serving Latency Chart

Re: Model Serving Latency Chart

Re: Model Serving Latency Chart