@Kaizen - Please refer to the below explanation.
In a model latency chart, P50 and P99 represent the median and 99th percentile round-trip latency times respectively.- P50 (Latency at 50th percentile) is the median latency, meaning that 50% of the requests have a latency that is less than this value and 50% have a latency that is greater.
- P99 (Latency at 99th percentile) is the value below which 99% of the observations may be found. In other words, only 1% of the requests have a latency that is greater than this value.These metrics are used to understand the distribution of latency and to identify outliers or abnormal behavior in system performance.
Reference: https://docs.databricks.com/en/machine-learning/model-serving/metrics-export-serving-endpoint.html#s...