I am trying to finetune llama2_lora model using the xTuring library, while facing this error. (batch size is 1). I am working on a cluster having 1 Worker (28 GB Memory, 4 Cores) and 1 Driver (110 GB Memory, 16 Cores).
I am facing this error: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 15.57 GiB total capacity; 8.02 GiB already allocated; 57.44 MiB free; 8.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.
It says that the total capacity is 15.57 GBs. Does this memory represents any of the worker or driver memory? If yes, should it be more than 15.57? Is the current implementation not able to utilize the available memory?