cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Provisioned concurrency of serving endpoints scales to zero

chidifrank
New Contributor II

Hi,

 
We provisioned the endpoint with 4 DBUs and also disabled the scale_to_zero option. For some reason, it randomly drops to 0 provisioned concurrency. Logs available in the serving endpoint service are not insightful.
 
Currently, we are provisioning the endpoint with 8 DBUs but still, it randomly drops 4. What might be the issue?

chidifrank_0-1690968368091.png

 

 
3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @chidifrankBased on the provided information, the endpoint seems to be scaling down to zero due to observing no traffic to the endpoint for 30 minutes when scale to zero is enabled.

However, in this case, the scale_to_zero option has been turned off. It is also mentioned that the logs available in the serving endpoint service are not insightful. 

One possible reason for this issue could be that the user application has periodic health checks/connection tests that open connections to the endpoints, which resets the auto-stop clock. Each openSession request resets the auto-stop clock, which might lead to the endpoint scaling to zero despite traffic.

 To resolve this issue, Databricks recommends not scaling to zero or sending warmup requests to the endpoint before user-facing traffic arrives at your service if the feature is used with a latency-sensitive application.

 Another possible reason for this issue could be throttling at the Azure resource manager, which causes the endpoint to take longer to transition to Running. In this case, the cluster's Spark logs can be used to identify the cause further. If a Spark performance issue is suspected, follow the approaches for performance tuning, or file a support ticket for further investigation.

chidifrank
New Contributor II

Hi,

I apologize if my question wasn't clear; let me clarify it.
We are not using the scale_to_zero option and we are not doing any warmup requests so it should never scale to zero despite traffic or zero traffic right? 

 

chidifrank_0-1691048861787.png

 

Kaniz
Community Manager
Community Manager

Thank you for your response, @chidifrank
Let me get back to you on it.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.