Hi @thomas_berry
This is a well-known issue with transformers and torch in environments that use forked processes or multiprocessing under the hood โ which is exactly what Databricks executors and serverless compute do.
Root cause: TrainingArguments triggers PyTorch's distributed training initialization code, which tries to detect available hardware and set up process groups. In Databricks (both classic and serverless), this spawns or probes subprocesses that deadlock because the Spark executor environment intercepts or blocks certain POSIX signals and fork behaviors. Your laptop doesn't have this problem because it's a clean single-process Python environment.
The fix: Set the following environment variables before importing anything from transformers or torch. The key one is telling PyTorch not to attempt distributed setup:
import os
os.environ["MASTER_ADDR"] = "localhost"
os.environ["MASTER_PORT"] = "12355"
os.environ["RANK"] = "0"
os.environ["WORLD_SIZE"] = "1"
os.environ["TORCHELASTIC_ERROR_FILE"] = "/tmp/torch_error.json"
# This is the critical one โ disables the torch.distributed init probe
os.environ["TORCH_DISTRIBUTED_DEBUG"] = "OFF"
os.environ["OMP_NUM_THREADS"] = "1"
Then your import and instantiation:
from transformers import TrainingArguments
print("start")
args = TrainingArguments(output_dir="test", no_cuda=True)
print("end")
Why no_cuda=True matters here too: Even without a GPU, TrainingArguments will probe CUDA device availability via torch.cuda, which can trigger another hang in Databricks serverless (DBR base v5 / Python 3.12) because the CUDA stub libraries behave differently inside the sandboxed execution environment.
If you're on serverless specifically, add this as well โ it prevents the tokenizers library (a transitive dependency) from spawning its own threads:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Cleanest pattern for a notebook cell:
import os
os.environ.update({
"MASTER_ADDR": "localhost",
"MASTER_PORT": "12355",
"RANK": "0",
"WORLD_SIZE": "1",
"OMP_NUM_THREADS": "1",
"TOKENIZERS_PARALLELISM": "false",
})
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="/tmp/model_output",
no_cuda=True,
)
print("TrainingArguments initialized successfully")
LR