How to allocate more memory to GPU when training through databricks notebook

varun-adi
Databricks Partner

I am trying to train a Hubert Model, specifically the facebook/hubert-base-ls960 model on a custom speech dataset.

Training parameters are below:

trainer_config = {
  "OUTPUT_DIR": "results",
  "TRAIN_EPOCHS": 6,
  "TRAIN_BATCH_SIZE": 2,
  "EVAL_BATCH_SIZE": 2,
  "GRADIENT_ACCUMULATION_STEPS": 4,
  "WARMUP_STEPS": 500,
  "DECAY": 0.01,
  "INITIAL_LOGGING_STEPS": 10,  # Smaller value for initial logging
  "LOGGING_STEPS": 100,  # Larger value for subsequent logging
  "MODEL_DIR": "/dbfs/FileStore/wav-files/personalityDataset/Augmented-HubertModel7Epochs",
  "SAVE_STEPS": 100
}

training_args = TrainingArguments(
    output_dir=trainer_config["OUTPUT_DIR"],
    gradient_accumulation_steps=trainer_config["GRADIENT_ACCUMULATION_STEPS"],
    num_train_epochs=trainer_config["TRAIN_EPOCHS"],
    per_device_train_batch_size=trainer_config["TRAIN_BATCH_SIZE"],
    per_device_eval_batch_size=trainer_config["EVAL_BATCH_SIZE"],
    warmup_steps=trainer_config["WARMUP_STEPS"],
    save_steps=trainer_config["SAVE_STEPS"],
    weight_decay=trainer_config["DECAY"],
    evaluation_strategy="epoch",  # Report metrics at the end of each epoch
    logging_steps=trainer_config["INITIAL_LOGGING_STEPS"],  # Initial logging frequency
    fp16=True  # Enable mixed-precision training
)


Running the nvidia-smi command yields below output:

-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 35C P8 9W / 70W | 3MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

We tried to expand the cluster memory to 32GB and current cluster configuration is:

1-2 Workers32-64 GB Memory8-16 Cores
1 Driver32 GB Memory, 8 Cores
Runtime13.1.x-gpu-ml-scala2.12

However, the memory allocated to GPU is still only ~16GB.

Due to this, training fails with below error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 530.00 MiB (GPU 0; 14.76 GiB total capacity; 12.87 GiB already allocated; 411.75 MiB free; 13.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried reducing batch-size to 1 also but still the same error persists.
How can I ensure that more memory is available to CUDA and the process when training through notebook?