CUDA out of memory

gary7135
New Contributor II

I am trying out the new Meta LLama2 model.

Following the databricks provided notebook example: https://github.com/databricks/databricks-ml-examples/blob/master/llm-models/llamav2/llamav2-13b/01_l...

 

I keep getting CUDA out of memory. My GPU cluster runtime is 

13.2 ML (includes Apache Spark 3.4.0, GPU, Scala 2.12), with 256GB memory and 1 GPU

 

Error message:

CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 14.76 GiB total capacity; 13.50 GiB already allocated; 313.75 MiB free; 13.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

 

 

What would be a good way to solve this issue?