cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Model serving Ran out of memory

Poised2Learn
New Contributor III


Hi fellows, 

I encountered memory(?) error when sending POST requests to my real-time endpoint, and I'm unable to find hardware setting to increase memory, as suggested by the Service Logs (below). 

Steps to Repro:

(1) I registered a custom MLFlow model with utils functions included in the code_path -argument of log_model(), as described in this doc
(2) I deployed the registered model as a Serving Endpoint 
(3) Upon sending requests to the endpoint through `score_model()`-function, I get the following response, Exception: Request failed with status 400, {"error_code":"Bad request.","message":"The model server has crashed unexpectedly. This happens e.g. if server runs out of memory. Please verify that your model can handle the volume and the type of requests with the current configuration."}


Steps I have attempted to resolve this issue: 

- I have tried to change the concurrency from Small to Large, but no changes in response

Below is my service logs

[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Starting gunicorn 21.2.0
[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Listening at: http://0.0.0.0:8080 (2)
[95wb9] [2023-10-10 00:08:42 +0000] [2] [INFO] Using worker: sync
[95wb9] [2023-10-10 00:08:42 +0000] [5] [INFO] Booting worker with pid: 5
[95wb9] [2023-10-10 00:08:43 +0000] [6] [INFO] Booting worker with pid: 6
[95wb9] [2023-10-10 00:08:43 +0000] [7] [INFO] Booting worker with pid: 7
[95wb9] [2023-10-10 00:08:43 +0000] [8] [INFO] Booting worker with pid: 8
[95wb9] [2023-10-10 00:12:53 +0000] [2] [ERROR] Worker (pid:6) was sent SIGKILL! Perhaps out of memory?
[95wb9] [2023-10-10 00:12:53 +0000] [111] [INFO] Booting worker with pid: 111

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Poised2Learn
New Contributor III

Thank you for your responses, @Annapurna_Hiriy and @Retired_mod

Indeed, it appeared that my original model (~800MB) was too big for the current server. Based on your suggestion, I made a simpler/smaller model for this project, and then I was able to deploy and get responses successfully.

I will reach out to the support team to increase our compute configuration, to handle other (large) models. 

View solution in original post

2 REPLIES 2

Annapurna_Hiriy
Databricks Employee
Databricks Employee

@Poised2Learn What do you see under Memory Usage in the Metrics tab? If you still see memory utilization over 70% after increasing the compute, please reach out to the Databricks support team to increase the compute for you. 

Poised2Learn
New Contributor III

Thank you for your responses, @Annapurna_Hiriy and @Retired_mod

Indeed, it appeared that my original model (~800MB) was too big for the current server. Based on your suggestion, I made a simpler/smaller model for this project, and then I was able to deploy and get responses successfully.

I will reach out to the support team to increase our compute configuration, to handle other (large) models. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group