cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Endpoint performance questions

Kaizen
Valued Contributor

Hi! 
Had really interesting results from some endpoint performance tests I did. I set up the non-optimized endpoint with zero-cluster scaling and optimized had this feature disabled.

1) Why does the non-optimized endpoint have variable response time for 3600, 1800, and 600 seconds tests? If the serving cluster node scaled to 0 (due to no traffic) I would expect it to also require 240 seconds to start up and start serving again. 

- what is going on behind the scenes that results in this?

2) It was also interesting to see that the endpoint metrcs showed request error rates (top right graph). The endpoint didnt have any bad responses. Also the logs didnt show anything that would allude to this. Any idea why this would be the case? See blow for the metrics image.

3) I didnt find much information on this on the databricks documentation. Any additional documentation would be appreicated! Happy to sync with the team

non-optimized endpoint results

Kaizen_1-1710196442817.png


optimized endpoint results 

Kaizen_0-1710196408535.png

metrics log:

Kaizen_2-1710196880601.png

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaizen
Valued Contributor

Independently found the solution to item 2. Currently you cannot modify the 30 min time for scale to zero. 

Hope this helps someone in the future!

View solution in original post

6 REPLIES 6

Kaizen
Valued Contributor

Kaizen
Valued Contributor

Answering Q1: 
1) The variable response time is due to the first endpoint response time requiring ~180 seconds to scale to 1 cluster from 0

2) Can i change zero scale time from the preset 30 min?

Kaniz_Fatma
Community Manager
Community Manager

Hi @KaizenLet’s delve into your intriguing endpoint performance observations:

  1. Variable Response Time:

    • The non-optimized endpoint exhibiting variable response times during different test durations (3600, 1800, and 600 seconds) can be attributed to the following factors:
      • Scaling Delay: When the serving cluster node scales down to zero due to no traffic, it indeed requires time to start up again. However, the startup time may not be a fixed 240 seconds. It depends on various factors such as the infrastructure, resource allocation, and initialization processes.
      • Resource Warm-Up: After scaling up, the endpoint needs to warm up its resources (e.g., loading models, initializing connections, caching data). This warm-up period introduces variability in response times.
      • Dynamic Load Balancing: The system might distribute incoming requests across available nodes dynamically. As a result, response times can fluctuate based on the current load distribution.
    • Behind the scenes, the orchestration involves managing resources, network communication, and service initialization, leading to the observed variability.
  2. Request Error Rates:

    • Even though the endpoint didn’t produce any bad responses, the request error rates were noticeable. Here are potential reasons:
      • Transient Issues: Some requests might have encountered transient issues (e.g., network glitches, timeouts, or resource contention) without resulting in explicit errors.
      • Latency Thresholds: Requests that exceed certain latency thresholds (e.g., due to resource contention or slow processing) could be considered as errors.
      • Load Spikes: Sudden spikes in traffic can lead to resource saturation, causing intermittent errors.
    • Investigating the logs further might reveal additional context, but sometimes these subtle anomalies remain elusive.
  3. Documentation:

    • While Databricks documentation might not explicitly cover every nuance, consider exploring general performance optimization strategies:
      • Caching: Optimize data caching to reduce redundant computations.
      • Compression: Compress responses to minimize network overhead.
      • Concurrency Control: Ensure proper concurrency management to prevent resource bottlenecks.
      • Monitoring and Alerts: Set up comprehensive monitoring and alerts to catch anomalies promptly.

 

 

Kaizen
Valued Contributor

Thanks for this. 

1) The odd values i got for 3600/1800/ etc was due to an outlier in my data so in general a response time of ~183 sec should be expected 

2) @Kaniz_Fatma can we adjust the scaling of the cluster from 30 min to something else?

Kaizen
Valued Contributor

@s_park / @Sujitha / @Debayan  could one of you address item 2?

Kaizen
Valued Contributor

Independently found the solution to item 2. Currently you cannot modify the 30 min time for scale to zero. 

Hope this helps someone in the future!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!