Hey @prashant_089 , what you are experiencing should not happen on its own except for some extremely outlying circumstanctes.
IF YOU ARE USING Databricks Free Edition you shold ignore everything below.
Here are some troubleshooting suggestions/tips:
Likely causes to check first
Workspace disabled/suspended or canceled: When a workspace is disabled or canceled (including transient off/on patterns), the platform can automatically delete serving endpoints. Weโve seen concrete cases where daily subscription toggling resulted in endpoint deletion, with no user-driven delete in audit logs.
Trial or compliance restrictions: If the customer is on a timeโbased trial or a compliance-restricted workspace (HIPAA/SHIELD/FedRAMP), serving may be disabled via SAFE flags. As part of enforcement, endpoints are deleted by the NOC process when serving is turned off.
โScale to zeroโ does not delete endpoints: The scale_to_zero_enabled flag only stops compute when idle; it does not remove the endpoint. Deletion requires an explicit delete, policy enforcement, or workspace lifecycle event.
Some things to test:
Confirm via REST GET whether the endpoint still exists. If itโs truly deleted, GET returns 404; if itโs stopped or updating, youโll get a response and can start it.
curl -s -H "Authorization: Bearer $DATABRICKS_TOKEN" \
https://<workspace-host>/api/2.0/serving-endpoints/<ENDPOINT_NAME>
If this returns 404 Not Found, the endpoint was deleted. If it returns JSON with state
, the endpoint still exists (possibly stopped).
Query Audit Logs for a deleteServingEndpoint
event around the disappearance window. Model Serving writes audit events under serverlessRealTimeInference
with the endpoint name in request params.
SELECT
timestamp,
service,
action,
request_params,
user_identity
FROM system.access.audit
WHERE service = 'serverlessRealTimeInference'
AND action = 'deleteServingEndpoint'
AND (request_params:name = '<ENDPOINT_NAME>' OR request_params LIKE '%<ENDPOINT_NAME>%')
AND timestamp BETWEEN TIMESTAMP('<START>') AND TIMESTAMP('<END>')
ORDER BY timestamp DESC;
Correlate workspace state changes (disable/enable/suspend) in system tables to the same time window. This helps confirm if deletion was triggered by workspace lifecycle rather than a user/API call.
SELECT *
FROM system.access.workspaces_latest
WHERE workspace_id = CURRENT_WORKSPACE()
ORDER BY updated_at DESC;
Quick checklist to run now
Run the GET endpoint check to confirm deletion vs. stopped/updating.
Query Audit Logs for deleteServingEndpoint
in the relevant window to see if a user/API deletion occurred.
3 sources
Query workspaces_latest to detect disable/enable/suspend events around the time the endpoint vanished.
Determine whether the customerโs workspace is trial or subject to compliance restrictions that might disable serving; if so, expect enforced deletions until serving is enabled.
Best of luck, Louis.