09-10-2023 09:22 AM
Hi Team,
I registered my ML model in databricks but while trying to serve an API endpoint for the model it is failing with the following error logs.
09-11-2023 04:36 AM
Hi @ombhuyan, Based on the given information, the error logs indicate that there are no replicas in a running state for the ML model serving endpoint. This could be due to a failure in the model validation process or an issue with the Databricks service.
To troubleshoot this issue, you can take the following steps:
1. Check the event log: The event log may provide more details about the failure during model validation. You can access the event log to see if there are any specific error messages or warnings related to the model serving endpoint.
2. Contact Databricks support: If the event log does not provide enough information or you cannot resolve the issue on your own, please file a support ticket with Databricks support for assistance.
09-15-2023 12:51 PM - edited 09-15-2023 12:52 PM
I am having the same issue on the large compute! Except my error looks like
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Starting gunicorn 21.2.0
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Using worker: sync
[rkxn8] [2023-09-15 19:49:24 +0000] [3] [INFO] Booting worker with pid: 3
[rkxn8] [2023-09-15 19:49:24 +0000] [4] [INFO] Booting worker with pid: 4
[rkxn8] [2023-09-15 19:49:24 +0000] [5] [INFO] Booting worker with pid: 5
[rkxn8] [2023-09-15 19:49:24 +0000] [6] [INFO] Booting worker with pid: 6
[rkxn8] [2023-09-15 19:49:57 +0000] [2] [ERROR] Worker (pid:5) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:49:57 +0000] [29] [INFO] Booting worker with pid: 29
[rkxn8] [2023-09-15 19:50:05 +0000] [2] [ERROR] Worker (pid:3) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:05 +0000] [33] [INFO] Booting worker with pid: 33
[rkxn8] [2023-09-15 19:50:48 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:48 +0000] [57] [INFO] Booting worker with pid: 57
[rkxn8] [2023-09-15 19:51:00 +0000] [2] [ERROR] Worker (pid:4) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:51:00 +0000] [63] [INFO] Booting worker with pid: 63
Trying to deploy a 1.5B param model.
09-21-2023 02:20 PM
03-08-2024 06:07 AM
Hi @Kumaran,
The linked code seems not to be available. Do you know whether there is an alternative link to it?
Thank you!
Octavian
11-10-2023 01:10 AM
@ombhuyan We currently only upload logs during the build phase to the user (i.e where we install the pip dependencies) but we don't upload logs during the pre-build phase (i.e where we download the model).
That's why you may not see clear error messages in build logs.
Please create an SF case if you still see this issue.
03-08-2024 06:10 AM
Hi,
I ran also into such an issue. I would find very useful to be able to see also the errors issued in the prebuild stage.
In any case, if it may help, eventually I found out through "trial and error" that the problem was caused by an incompatible version of one of the packages supposed to be installed in the container.
Octavian
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.