Databricks Community

Koliya · ‎12-21-2022

I am running a hugging face model on a GPU cluster (g4dn.xlarge, 16GB Memory, 4 cores). I run the same model in four different notebooks with different data sources. I created a workflow to run one model after the other. These notebooks run fine individually, but in the workflow setup, it gives me a Fatal error: The Python kernel is unresponsive (The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.).

daniel_sahal · ‎12-21-2022

It could be due to the caching that may use some amount of memory when you're reusing cluster.

Simply try increasing your memory and/or optimize your code a little bit.

Koliya · ‎01-04-2023

I am not using a big batch of data during the process. It's just five text documents with less than 1000 characters each approximately. I am utilising the GPU to run the transformer model. So the model itself is not really running on CPU. That's why it is weird to get an OOM error with a significantly less amount of data that's been processed from the CPU.

jose_gonzalez · ‎12-27-2022

You can check the executor's logs to narrow down the error if you would like, but technically, this is a OOM and increasing your cluster's resource will mitigate this issue

fkemeth · ‎03-27-2023

You might accumulate gradients when running your Huggingface model, which typically leads to out-of-memory errors after some iterations. If you use it for inference only, do

with torch.no_grad():
    # The code where you apply the model

Databricks Community

The Python process exited with exit code 137 (SIGKILL: Killed). This may have been caused by an OOM error. Check your command's memory usage.

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples