Databricks Community

Kaizen · ‎03-11-2024

Hi!
Had really interesting results from some endpoint performance tests I did. I set up the non-optimized endpoint with zero-cluster scaling and optimized had this feature disabled.

1) Why does the non-optimized endpoint have variable response time for 3600, 1800, and 600 seconds tests? If the serving cluster node scaled to 0 (due to no traffic) I would expect it to also require 240 seconds to start up and start serving again.

- what is going on behind the scenes that results in this?

2) It was also interesting to see that the endpoint metrcs showed request error rates (top right graph). The endpoint didnt have any bad responses. Also the logs didnt show anything that would allude to this. Any idea why this would be the case? See blow for the metrics image.

3) I didnt find much information on this on the databricks documentation. Any additional documentation would be appreicated! Happy to sync with the team

non-optimized endpoint results

optimized endpoint results

metrics log:

Kaizen · ‎04-09-2024

Independently found the solution to item 2. Currently you cannot modify the 30 min time for scale to zero.

Hope this helps someone in the future!

View solution in original post

Kaizen · ‎03-11-2024

@s_park / @Sujitha / @Debayan

Kaizen · ‎03-11-2024

Answering Q1:
1) The variable response time is due to the first endpoint response time requiring ~180 seconds to scale to 1 cluster from 0

2) Can i change zero scale time from the preset 30 min?

Kaizen · ‎03-12-2024

Thanks for this.

1) The odd values i got for 3600/1800/ etc was due to an outlier in my data so in general a response time of ~183 sec should be expected

2) @Retired_mod can we adjust the scaling of the cluster from 30 min to something else?