Model Serving Latency Chart

Kaizen
Valued Contributor

Hi, 

For the model serving latency graph what is p50 and p99? I only have one model i am serving on this endpoing so im surprised to see two models being tracked

 

Kaizen_0-1714504038212.png

 

Kaizen
Valued Contributor

If im not mistaken this refers to 50% of responses and 99% responses and averages accordingly for the metrics?

 

@s_park 
@Sujitha 
@Debayan 

shan_chandra
Databricks Employee
Databricks Employee

@Kaizen - Please refer to the below explanation.

In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.

Reference: https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html#s...