cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Model Serving Endpoint keeps failing with SIGKILL error

AChang
New Contributor III

I am trying to deploy a model in the serving endpoints section, but it keeps failing after attempting to create for an hour. Here are the service logs:

Container failed with: 9 +0000] [115] [INFO] Booting worker with pid: 115
[2023-09-15 19:15:35 +0000] [2] [ERROR] Worker (pid:73) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:15:35 +0000] [119] [INFO] Booting worker with pid: 119
[2023-09-15 19:15:57 +0000] [2] [ERROR] Worker (pid:99) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:15:57 +0000] [131] [INFO] Booting worker with pid: 131
2023-09-15 19:16:05.631648: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-15 19:16:06.710808: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
[2023-09-15 19:16:07 +0000] [2] [ERROR] Worker (pid:93) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:07 +0000] [137] [INFO] Booting worker with pid: 137
[2023-09-15 19:16:35 +0000] [2] [ERROR] Worker (pid:119) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:35 +0000] [155] [INFO] Booting worker with pid: 155
[2023-09-15 19:16:42 +0000] [2] [ERROR] Worker (pid:115) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:42 +0000] [159] [INFO] Booting worker with pid: 159
[2023-09-15 19:17:10 +0000] [2] [ERROR] Worker (pid:131) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:10 +0000] [175] [INFO] Booting worker with pid: 175
[2023-09-15 19:17:17 +0000] [2] [ERROR] Worker (pid:137) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:17 +0000] [179] [INFO] Booting worker with pid: 179
[2023-09-15 19:17:46 +0000] [2] [ERROR] Worker (pid:159) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:46 +0000] [195] [INFO] Booting worker with pid: 195

Should I try moving to the largest compute, or is the issue more to do with the model itself?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group