The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.
I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine indi...
- 18687 Views
- 4 replies
- 7 kudos
Latest Reply
You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, dowith torch.no_grad(): # The code where you apply the model
- 7 kudos