cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Serving API endpoint failing

ombhuyan
New Contributor II

Hi Team,
I registered my ML model in databricks but while trying to serve an API endpoint for the model it is failing with the following error logs.

Service logs: There are currently no replicas in a running state.
Build logs :Build never started - check the event log to see if the model failed validation or contact databricks.
Can someone help me in debugging the issue?
 
 
6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @ombhuyanBased on the given information, the error logs indicate that there are no replicas in a running state for the ML model serving endpoint. This could be due to a failure in the model validation process or an issue with the Databricks service. 

To troubleshoot this issue, you can take the following steps:

1. Check the event log: The event log may provide more details about the failure during model validation. You can access the event log to see if there are any specific error messages or warnings related to the model serving endpoint.

2. Contact Databricks support: If the event log does not provide enough information or you cannot resolve the issue on your own, please file a support ticket with Databricks support for assistance. 

AChang
New Contributor III

I am having the same issue on the large compute! Except my error looks like

[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Starting gunicorn 21.2.0
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Using worker: sync
[rkxn8] [2023-09-15 19:49:24 +0000] [3] [INFO] Booting worker with pid: 3
[rkxn8] [2023-09-15 19:49:24 +0000] [4] [INFO] Booting worker with pid: 4
[rkxn8] [2023-09-15 19:49:24 +0000] [5] [INFO] Booting worker with pid: 5
[rkxn8] [2023-09-15 19:49:24 +0000] [6] [INFO] Booting worker with pid: 6
[rkxn8] [2023-09-15 19:49:57 +0000] [2] [ERROR] Worker (pid:5) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:49:57 +0000] [29] [INFO] Booting worker with pid: 29
[rkxn8] [2023-09-15 19:50:05 +0000] [2] [ERROR] Worker (pid:3) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:05 +0000] [33] [INFO] Booting worker with pid: 33
[rkxn8] [2023-09-15 19:50:48 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:48 +0000] [57] [INFO] Booting worker with pid: 57
[rkxn8] [2023-09-15 19:51:00 +0000] [2] [ERROR] Worker (pid:4) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:51:00 +0000] [63] [INFO] Booting worker with pid: 63

Trying to deploy a 1.5B param model.

Kumaran
Valued Contributor III
Valued Contributor III

Hi @ombhuyan,

Thank you for posting your question in the Databricks Community.

I am unsure of the issue without seeing the code. however, could you check with the example code here and see what is missing? 

Hi @Kumaran,
The linked code seems not to be available. Do you know whether there is an alternative link to it?
Thank you!
Octavian

Annapurna_Hiriy
New Contributor III
New Contributor III

@ombhuyan We currently only upload logs during the build phase to the user (i.e where we install the pip dependencies) but we don't upload logs during the pre-build phase (i.e where we download the model).
That's why you may not see clear error messages in build logs.
Please create an SF case if you still see this issue.

Hi,

I ran also into such an issue. I would find very useful to be able to see also the errors issued in the prebuild stage.

In any case, if it may help, eventually I found out through "trial and error" that the problem was caused by an incompatible version of one of the packages supposed to be installed in the container.

Octavian

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.