cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Model Serving Latency Chart

Kaizen
Contributor III

Hi, 

For the model serving latency graph what is p50 and p99? I only have one model i am serving on this endpoing so im surprised to see two models being tracked

 

Kaizen_0-1714504038212.png

 

2 REPLIES 2

Kaizen
Contributor III

If im not mistaken this refers to 50% of responses and 99% responses and averages accordingly for the metrics?

 

@s_park 
@Sujitha 
@Debayan 

shan_chandra
Esteemed Contributor
Esteemed Contributor

@Kaizen - Please refer to the below explanation.

In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.

Reference: https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html#s...