cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Endpoint performance questions

Kaizen
Contributor III

Hi! 
Had really interesting results from some endpoint performance tests I did. I set up the non-optimized endpoint with zero-cluster scaling and optimized had this feature disabled.

1) Why does the non-optimized endpoint have variable response time for 3600, 1800, and 600 seconds tests? If the serving cluster node scaled to 0 (due to no traffic) I would expect it to also require 240 seconds to start up and start serving again. 

- what is going on behind the scenes that results in this?

2) It was also interesting to see that the endpoint metrcs showed request error rates (top right graph). The endpoint didnt have any bad responses. Also the logs didnt show anything that would allude to this. Any idea why this would be the case? See blow for the metrics image.

3) I didnt find much information on this on the databricks documentation. Any additional documentation would be appreicated! Happy to sync with the team

non-optimized endpoint results

Kaizen_1-1710196442817.png


optimized endpoint results 

Kaizen_0-1710196408535.png

metrics log:

Kaizen_2-1710196880601.png

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaizen
Contributor III

Independently found the solution to item 2. Currently you cannot modify the 30 min time for scale to zero. 

Hope this helps someone in the future!

View solution in original post

6 REPLIES 6

Kaizen
Contributor III

Kaizen
Contributor III

Answering Q1: 
1) The variable response time is due to the first endpoint response time requiring ~180 seconds to scale to 1 cluster from 0

2) Can i change zero scale time from the preset 30 min?

Kaniz
Community Manager
Community Manager

Hi @KaizenLet’s delve into your intriguing endpoint performance observations:

  1. Variable Response Time:

    • The non-optimized endpoint exhibiting variable response times during different test durations (3600, 1800, and 600 seconds) can be attributed to the following factors:
      • Scaling Delay: When the serving cluster node scales down to zero due to no traffic, it indeed requires time to start up again. However, the startup time may not be a fixed 240 seconds. It depends on various factors such as the infrastructure, resource allocation, and initialization processes.
      • Resource Warm-Up: After scaling up, the endpoint needs to warm up its resources (e.g., loading models, initializing connections, caching data). This warm-up period introduces variability in response times.
      • Dynamic Load Balancing: The system might distribute incoming requests across available nodes dynamically. As a result, response times can fluctuate based on the current load distribution.
    • Behind the scenes, the orchestration involves managing resources, network communication, and service initialization, leading to the observed variability.
  2. Request Error Rates:

    • Even though the endpoint didn’t produce any bad responses, the request error rates were noticeable. Here are potential reasons:
      • Transient Issues: Some requests might have encountered transient issues (e.g., network glitches, timeouts, or resource contention) without resulting in explicit errors.
      • Latency Thresholds: Requests that exceed certain latency thresholds (e.g., due to resource contention or slow processing) could be considered as errors.
      • Load Spikes: Sudden spikes in traffic can lead to resource saturation, causing intermittent errors.
    • Investigating the logs further might reveal additional context, but sometimes these subtle anomalies remain elusive.
  3. Documentation:

    • While Databricks documentation might not explicitly cover every nuance, consider exploring general performance optimization strategies:
      • Caching: Optimize data caching to reduce redundant computations.
      • Compression: Compress responses to minimize network overhead.
      • Concurrency Control: Ensure proper concurrency management to prevent resource bottlenecks.
      • Monitoring and Alerts: Set up comprehensive monitoring and alerts to catch anomalies promptly.

 

 

Kaizen
Contributor III

Thanks for this. 

1) The odd values i got for 3600/1800/ etc was due to an outlier in my data so in general a response time of ~183 sec should be expected 

2) @Kaniz can we adjust the scaling of the cluster from 30 min to something else?

Kaizen
Contributor III

@s_park / @Sujitha / @Debayan  could one of you address item 2?

Kaizen
Contributor III

Independently found the solution to item 2. Currently you cannot modify the 30 min time for scale to zero. 

Hope this helps someone in the future!