cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Serving API endpoint failing

ombhuyan
New Contributor II

Hi Team,
I registered my ML model in databricks but while trying to serve an API endpoint for the model it is failing with the following error logs.

Service logs: There are currently no replicas in a running state.
Build logs :Build never started - check the event log to see if the model failed validation or contact databricks.
Can someone help me in debugging the issue?
 
 
5 REPLIES 5

AChang
New Contributor III

I am having the same issue on the large compute! Except my error looks like

[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Starting gunicorn 21.2.0
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Using worker: sync
[rkxn8] [2023-09-15 19:49:24 +0000] [3] [INFO] Booting worker with pid: 3
[rkxn8] [2023-09-15 19:49:24 +0000] [4] [INFO] Booting worker with pid: 4
[rkxn8] [2023-09-15 19:49:24 +0000] [5] [INFO] Booting worker with pid: 5
[rkxn8] [2023-09-15 19:49:24 +0000] [6] [INFO] Booting worker with pid: 6
[rkxn8] [2023-09-15 19:49:57 +0000] [2] [ERROR] Worker (pid:5) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:49:57 +0000] [29] [INFO] Booting worker with pid: 29
[rkxn8] [2023-09-15 19:50:05 +0000] [2] [ERROR] Worker (pid:3) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:05 +0000] [33] [INFO] Booting worker with pid: 33
[rkxn8] [2023-09-15 19:50:48 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:48 +0000] [57] [INFO] Booting worker with pid: 57
[rkxn8] [2023-09-15 19:51:00 +0000] [2] [ERROR] Worker (pid:4) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:51:00 +0000] [63] [INFO] Booting worker with pid: 63

Trying to deploy a 1.5B param model.

Kumaran
Databricks Employee
Databricks Employee

Hi @ombhuyan,

Thank you for posting your question in the Databricks Community.

I am unsure of the issue without seeing the code. however, could you check with the example code here and see what is missing? 

Hi @Kumaran,
The linked code seems not to be available. Do you know whether there is an alternative link to it?
Thank you!
Octavian

Annapurna_Hiriy
Databricks Employee
Databricks Employee

@ombhuyan We currently only upload logs during the build phase to the user (i.e where we install the pip dependencies) but we don't upload logs during the pre-build phase (i.e where we download the model).
That's why you may not see clear error messages in build logs.
Please create an SF case if you still see this issue.

Hi,

I ran also into such an issue. I would find very useful to be able to see also the errors issued in the prebuild stage.

In any case, if it may help, eventually I found out through "trial and error" that the problem was caused by an incompatible version of one of the packages supposed to be installed in the container.

Octavian

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group