I am trying to deploy a model in the serving endpoints section, but it keeps failing after attempting to create for an hour. Here are the service logs:
Container failed with: 9 +0000] [115] [INFO] Booting worker with pid: 115
[2023-09-15 19:15:35 +0000] [2] [ERROR] Worker (pid:73) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:15:35 +0000] [119] [INFO] Booting worker with pid: 119
[2023-09-15 19:15:57 +0000] [2] [ERROR] Worker (pid:99) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:15:57 +0000] [131] [INFO] Booting worker with pid: 131
2023-09-15 19:16:05.631648: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-15 19:16:06.710808: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
[2023-09-15 19:16:07 +0000] [2] [ERROR] Worker (pid:93) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:07 +0000] [137] [INFO] Booting worker with pid: 137
[2023-09-15 19:16:35 +0000] [2] [ERROR] Worker (pid:119) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:35 +0000] [155] [INFO] Booting worker with pid: 155
[2023-09-15 19:16:42 +0000] [2] [ERROR] Worker (pid:115) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:16:42 +0000] [159] [INFO] Booting worker with pid: 159
[2023-09-15 19:17:10 +0000] [2] [ERROR] Worker (pid:131) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:10 +0000] [175] [INFO] Booting worker with pid: 175
[2023-09-15 19:17:17 +0000] [2] [ERROR] Worker (pid:137) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:17 +0000] [179] [INFO] Booting worker with pid: 179
[2023-09-15 19:17:46 +0000] [2] [ERROR] Worker (pid:159) was sent SIGKILL! Perhaps out of memory?
[2023-09-15 19:17:46 +0000] [195] [INFO] Booting worker with pid: 195
Should I try moving to the largest compute, or is the issue more to do with the model itself?