Databricks Community

Koliya · ‎12-21-2022

I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine individually, but in the workflow setup, it gives me a Fatal error: The Python kernel is unresponsive (The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.).

daniel_sahal · ‎12-21-2022

It could be due to the caching that may use some amount of memory when you're reusing cluster.

Simply try increasing your memory and/or optimize your code a little bit.

Koliya · ‎01-04-2023

I am not using a big batch of data during the process. It's just five text documents with less than 1000 characters each approximately. I am utilising the GPU to run the transformer model. So the model itself is not really running on CPU. That's why it is weird to get an OOM error with a significantly less amount of data that's been processed from the CPU.

jose_gonzalez · ‎12-27-2022

You can check the executor's logs to narrow down the error if you would like, but technically, this is a OOM and increasing your cluster's resource will mitigate this issue

fkemeth · ‎03-27-2023

You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, do

with torch.no_grad():
    # The code where you apply the model

Databricks Community

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

Photos

Connect with Databricks Users in Your Area

Virtual Learning Festival: 9 April - 30 April

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Data + AI Summit 2025 — registration now open!

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch