Hi @chidifrank, Based on the provided information, the endpoint seems to be scaling down to zero due to observing no traffic to the endpoint for 30 minutes when scale to zero is enabled.
However, in this case, the scale_to_zero option has been turned off. It is also mentioned that the logs available in the serving endpoint service are not insightful.
One possible reason for this issue could be that the user application has periodic health checks/connection tests that open connections to the endpoints, which resets the auto-stop clock. Each openSession request resets the auto-stop clock, which might lead to the endpoint scaling to zero despite traffic.
To resolve this issue, Databricks recommends not scaling to zero or sending warmup requests to the endpoint before user-facing traffic arrives at your service if the feature is used with a latency-sensitive application.
Another possible reason for this issue could be throttling at the Azure resource manager, which causes the endpoint to take longer to transition to Running. In this case, the cluster's Spark logs can be used to identify the cause further. If a Spark performance issue is suspected, follow the approaches for performance tuning, or file a support ticket for further investigation.