cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Serving API endpoint failing

ombhuyan
New Contributor II

Hi Team,
I registered my ML model in databricks but while trying to serve an API endpoint for the model it is failing with the following error logs.

Service logs: There are currently no replicas in a running state.
Build logs :Build never started - check the event log to see if the model failed validation or contact databricks.
Can someone help me in debugging the issue?
 
 
6 REPLIES 6

Kaniz_Fatma
Community Manager
Community Manager

Hi @ombhuyanBased on the given information, the error logs indicate that there are no replicas in a running state for the ML model serving endpoint. This could be due to a failure in the model validation process or an issue with the Databricks service. 

To troubleshoot this issue, you can take the following steps:

1. Check the event log: The event log may provide more details about the failure during model validation. You can access the event log to see if there are any specific error messages or warnings related to the model serving endpoint.

2. Contact Databricks support: If the event log does not provide enough information or you cannot resolve the issue on your own, please file a support ticket with Databricks support for assistance. 

AChang
New Contributor III

I am having the same issue on the large compute! Except my error looks like

[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Starting gunicorn 21.2.0
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Using worker: sync
[rkxn8] [2023-09-15 19:49:24 +0000] [3] [INFO] Booting worker with pid: 3
[rkxn8] [2023-09-15 19:49:24 +0000] [4] [INFO] Booting worker with pid: 4
[rkxn8] [2023-09-15 19:49:24 +0000] [5] [INFO] Booting worker with pid: 5
[rkxn8] [2023-09-15 19:49:24 +0000] [6] [INFO] Booting worker with pid: 6
[rkxn8] [2023-09-15 19:49:57 +0000] [2] [ERROR] Worker (pid:5) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:49:57 +0000] [29] [INFO] Booting worker with pid: 29
[rkxn8] [2023-09-15 19:50:05 +0000] [2] [ERROR] Worker (pid:3) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:05 +0000] [33] [INFO] Booting worker with pid: 33
[rkxn8] [2023-09-15 19:50:48 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:48 +0000] [57] [INFO] Booting worker with pid: 57
[rkxn8] [2023-09-15 19:51:00 +0000] [2] [ERROR] Worker (pid:4) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:51:00 +0000] [63] [INFO] Booting worker with pid: 63

Trying to deploy a 1.5B param model.

Kumaran
Valued Contributor III

Hi @ombhuyan,

Thank you for posting your question in the Databricks Community.

I am unsure of the issue without seeing the code. however, could you check with the example code here and see what is missing? 

Hi @Kumaran,
The linked code seems not to be available. Do you know whether there is an alternative link to it?
Thank you!
Octavian

Annapurna_Hiriy
Contributor

@ombhuyan We currently only upload logs during the build phase to the user (i.e where we install the pip dependencies) but we don't upload logs during the pre-build phase (i.e where we download the model).
That's why you may not see clear error messages in build logs.
Please create an SF case if you still see this issue.

Hi,

I ran also into such an issue. I would find very useful to be able to see also the errors issued in the prebuild stage.

In any case, if it may help, eventually I found out through "trial and error" that the problem was caused by an incompatible version of one of the packages supposed to be installed in the container.

Octavian

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group