<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to allocate more memory to GPU when training through databricks notebook in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/how-to-allocate-more-memory-to-gpu-when-training-through/m-p/45614#M2341</link>
    <description>&lt;P&gt;I am trying to train a Hubert Model, specifically the&amp;nbsp;&lt;STRONG&gt;facebook/hubert-base-ls960&lt;/STRONG&gt; model on a custom speech dataset.&lt;/P&gt;&lt;P&gt;Training parameters are below:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;trainer_config = {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "OUTPUT_DIR": "results",&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "TRAIN_EPOCHS": 6,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "TRAIN_BATCH_SIZE": 2,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "EVAL_BATCH_SIZE": 2,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "GRADIENT_ACCUMULATION_STEPS": 4,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "WARMUP_STEPS": 500,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "DECAY": 0.01,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "INITIAL_LOGGING_STEPS": 10, &amp;nbsp;# Smaller value for initial logging&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "LOGGING_STEPS": 100, &amp;nbsp;# Larger value for subsequent logging&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "MODEL_DIR": "/dbfs/FileStore/wav-files/personalityDataset/Augmented-HubertModel7Epochs",&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "SAVE_STEPS": 100&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;training_args = TrainingArguments(&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; output_dir=trainer_config["OUTPUT_DIR"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; gradient_accumulation_steps=trainer_config["GRADIENT_ACCUMULATION_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; num_train_epochs=trainer_config["TRAIN_EPOCHS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; per_device_train_batch_size=trainer_config["TRAIN_BATCH_SIZE"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; per_device_eval_batch_size=trainer_config["EVAL_BATCH_SIZE"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; warmup_steps=trainer_config["WARMUP_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; save_steps=trainer_config["SAVE_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; weight_decay=trainer_config["DECAY"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; evaluation_strategy="epoch", &amp;nbsp;# Report metrics at the end of each epoch&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; logging_steps=trainer_config["INITIAL_LOGGING_STEPS"], &amp;nbsp;# Initial logging frequency&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; fp16=True &amp;nbsp;# Enable mixed-precision training&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Running the&amp;nbsp;nvidia-smi command yields below output:&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 35C P8 9W / 70W | 3MiB / &lt;STRONG&gt;15109MiB&lt;/STRONG&gt; | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+&lt;/P&gt;&lt;P&gt;We tried to expand the cluster memory to 32GB and current cluster configuration is:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1-2 Workers32-64&amp;nbsp;GB Memory8-16&amp;nbsp;Cores&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;1 Driver32&amp;nbsp;GB Memory,&amp;nbsp;8&amp;nbsp;Cores&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Runtime13.1.x-gpu-ml-scala2.12&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;However, the memory allocated to GPU is still only ~16GB.&lt;/P&gt;&lt;P&gt;Due to this, training fails with below error:&lt;BR /&gt;&lt;EM&gt;OutOfMemoryError: CUDA out of memory. Tried to allocate 530.00 MiB (GPU 0; 14.76 GiB total capacity; 12.87 GiB already allocated; 411.75 MiB free; 13.26 GiB reserved in total by PyTorch) If reserved memory is &amp;gt;&amp;gt; allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I tried reducing batch-size to 1 also but still the same error persists.&lt;BR /&gt;How can I ensure that more memory is available to CUDA and the process when training through notebook?&lt;/P&gt;</description>
    <pubDate>Fri, 22 Sep 2023 07:21:13 GMT</pubDate>
    <dc:creator>varun-adi</dc:creator>
    <dc:date>2023-09-22T07:21:13Z</dc:date>
    <item>
      <title>How to allocate more memory to GPU when training through databricks notebook</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-allocate-more-memory-to-gpu-when-training-through/m-p/45614#M2341</link>
      <description>&lt;P&gt;I am trying to train a Hubert Model, specifically the&amp;nbsp;&lt;STRONG&gt;facebook/hubert-base-ls960&lt;/STRONG&gt; model on a custom speech dataset.&lt;/P&gt;&lt;P&gt;Training parameters are below:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;trainer_config = {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "OUTPUT_DIR": "results",&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "TRAIN_EPOCHS": 6,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "TRAIN_BATCH_SIZE": 2,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "EVAL_BATCH_SIZE": 2,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "GRADIENT_ACCUMULATION_STEPS": 4,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "WARMUP_STEPS": 500,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "DECAY": 0.01,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "INITIAL_LOGGING_STEPS": 10, &amp;nbsp;# Smaller value for initial logging&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "LOGGING_STEPS": 100, &amp;nbsp;# Larger value for subsequent logging&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "MODEL_DIR": "/dbfs/FileStore/wav-files/personalityDataset/Augmented-HubertModel7Epochs",&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; "SAVE_STEPS": 100&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;training_args = TrainingArguments(&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; output_dir=trainer_config["OUTPUT_DIR"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; gradient_accumulation_steps=trainer_config["GRADIENT_ACCUMULATION_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; num_train_epochs=trainer_config["TRAIN_EPOCHS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; per_device_train_batch_size=trainer_config["TRAIN_BATCH_SIZE"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; per_device_eval_batch_size=trainer_config["EVAL_BATCH_SIZE"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; warmup_steps=trainer_config["WARMUP_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; save_steps=trainer_config["SAVE_STEPS"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; weight_decay=trainer_config["DECAY"],&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; evaluation_strategy="epoch", &amp;nbsp;# Report metrics at the end of each epoch&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; logging_steps=trainer_config["INITIAL_LOGGING_STEPS"], &amp;nbsp;# Initial logging frequency&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; fp16=True &amp;nbsp;# Enable mixed-precision training&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Running the&amp;nbsp;nvidia-smi command yields below output:&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 35C P8 9W / 70W | 3MiB / &lt;STRONG&gt;15109MiB&lt;/STRONG&gt; | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+&lt;/P&gt;&lt;P&gt;We tried to expand the cluster memory to 32GB and current cluster configuration is:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1-2 Workers32-64&amp;nbsp;GB Memory8-16&amp;nbsp;Cores&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;1 Driver32&amp;nbsp;GB Memory,&amp;nbsp;8&amp;nbsp;Cores&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Runtime13.1.x-gpu-ml-scala2.12&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;However, the memory allocated to GPU is still only ~16GB.&lt;/P&gt;&lt;P&gt;Due to this, training fails with below error:&lt;BR /&gt;&lt;EM&gt;OutOfMemoryError: CUDA out of memory. Tried to allocate 530.00 MiB (GPU 0; 14.76 GiB total capacity; 12.87 GiB already allocated; 411.75 MiB free; 13.26 GiB reserved in total by PyTorch) If reserved memory is &amp;gt;&amp;gt; allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I tried reducing batch-size to 1 also but still the same error persists.&lt;BR /&gt;How can I ensure that more memory is available to CUDA and the process when training through notebook?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Sep 2023 07:21:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-allocate-more-memory-to-gpu-when-training-through/m-p/45614#M2341</guid>
      <dc:creator>varun-adi</dc:creator>
      <dc:date>2023-09-22T07:21:13Z</dc:date>
    </item>
  </channel>
</rss>

