โ09-10-2023 09:22 AM
Hi Team,
I registered my ML model in databricks but while trying to serve an API endpoint for the model it is failing with the following error logs.
โ09-11-2023 04:36 AM
Hi @ombhuyan, Based on the given information, the error logs indicate that there are no replicas in a running state for the ML model serving endpoint. This could be due to a failure in the model validation process or an issue with the Databricks service.
To troubleshoot this issue, you can take the following steps:
1. Check the event log: The event log may provide more details about the failure during model validation. You can access the event log to see if there are any specific error messages or warnings related to the model serving endpoint.
2. Contact Databricks support: If the event log does not provide enough information or you cannot resolve the issue on your own, please file a support ticket with Databricks support for assistance.
โ09-15-2023 12:51 PM - edited โ09-15-2023 12:52 PM
I am having the same issue on the large compute! Except my error looks like
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Starting gunicorn 21.2.0
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[rkxn8] [2023-09-15 19:49:24 +0000] [2] [INFO] Using worker: sync
[rkxn8] [2023-09-15 19:49:24 +0000] [3] [INFO] Booting worker with pid: 3
[rkxn8] [2023-09-15 19:49:24 +0000] [4] [INFO] Booting worker with pid: 4
[rkxn8] [2023-09-15 19:49:24 +0000] [5] [INFO] Booting worker with pid: 5
[rkxn8] [2023-09-15 19:49:24 +0000] [6] [INFO] Booting worker with pid: 6
[rkxn8] [2023-09-15 19:49:57 +0000] [2] [ERROR] Worker (pid:5) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:49:57 +0000] [29] [INFO] Booting worker with pid: 29
[rkxn8] [2023-09-15 19:50:05 +0000] [2] [ERROR] Worker (pid:3) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:05 +0000] [33] [INFO] Booting worker with pid: 33
[rkxn8] [2023-09-15 19:50:48 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:50:48 +0000] [57] [INFO] Booting worker with pid: 57
[rkxn8] [2023-09-15 19:51:00 +0000] [2] [ERROR] Worker (pid:4) was sent SIGKILL! Perhaps out of memory?
[rkxn8] [2023-09-15 19:51:00 +0000] [63] [INFO] Booting worker with pid: 63
Trying to deploy a 1.5B param model.
โ09-21-2023 02:20 PM
โ03-08-2024 06:07 AM
Hi @Kumaran,
The linked code seems not to be available. Do you know whether there is an alternative link to it?
Thank you!
Octavian
โ11-10-2023 01:10 AM
@ombhuyan We currently only upload logs during the build phase to the user (i.e where we install the pip dependencies) but we don't upload logs during the pre-build phase (i.e where we download the model).
That's why you may not see clear error messages in build logs.
Please create an SF case if you still see this issue.
โ03-08-2024 06:10 AM
Hi,
I ran also into such an issue. I would find very useful to be able to see also the errors issued in the prebuild stage.
In any case, if it may help, eventually I found out through "trial and error" that the problem was caused by an incompatible version of one of the packages supposed to be installed in the container.
Octavian
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group