Hi fellows,
I encountered memory(?) error when sending POST requests to my real-time endpoint, and I'm unable to find hardware setting to increase memory, as suggested by the Service Logs (below).
Steps to Repro:
(1) I registered a custom MLFlow model with utils functions included in the code_path -argument of log_model(), as described in this doc
(2) I deployed the registered model as a Serving Endpoint
(3) Upon sending requests to the endpoint through `score_model()`-function, I get the following response, Exception: Request failed with status 400, {"error_code":"Bad request.","message":"The model server has crashed unexpectedly. This happens e.g. if server runs out of memory. Please verify that your model can handle the volume and the type of requests with the current configuration."}
Steps I have attempted to resolve this issue:
- I have tried to change the concurrency from Small to Large, but no changes in response
Below is my service logs
[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Starting gunicorn 21.2.0
[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Using worker: sync
[95wb9] [2023-10-10 00:08:42 +0000] [5] [INFO] Booting worker with pid: 5
[95wb9] [2023-10-10 00:08:43 +0000] [6] [INFO] Booting worker with pid: 6
[95wb9] [2023-10-10 00:08:43 +0000] [7] [INFO] Booting worker with pid: 7
[95wb9] [2023-10-10 00:08:43 +0000] [8] [INFO] Booting worker with pid: 8
[95wb9] [2023-10-10 00:12:53 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[95wb9] [2023-10-10 00:12:53 +0000] [111] [INFO] Booting worker with pid: 111