โ04-13-2023 10:48 AM
Hi, I am new to LLM and am curious to try it out. I did the following code to test from the databricks website:
import torch
from transformers import pipeline
instruct_pipeline = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
and it seems to be downding a 24gig model file every time the cluster is restarted.
Downloading (โฆ)"pytorch_model.bin";: 100% - 23.8G/23.8G [02:39<00:00, 128MB/s]
is there a way (and where can i find the instructions) to load the pytorch_model.bin file "locally" so it's not downloading it every time the cluster is restarted?
Add-on question, what's a decent cluster config to test things out? so far I've been trying to test it with g4dn.2xlarge (32gig, 1 gpu) with 12.2 lts ml (gpu) and it's telling me a CUDA out of memory error.
OutOfMemoryError: CUDA out of memory. Tried to allocate 492.00 MiB (GPU 0; 14.76 GiB total capacity; 13.52 GiB already allocated; 483.75 MiB free; 13.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
โ04-15-2023 05:54 PM
@H Tโ : I wont have a specific answer for Dolly right now, but i shall give a framework to think about it for you to test and try.
To avoid downloading the model every time the cluster is restarted, you can upload the pytorch_model.bin file to your Databricks workspace or to a cloud storage account and then load it from there instead of using the default model location. You can do this by specifying the model
argument as the path to the uploaded model file:
instruct_pipeline = pipeline(model="/path/to/local/model/pytorch_model.bin", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
As for the cluster configuration, it depends on the size of your data and the complexity of your models. For testing purposes, you can start with a smaller instance size and scale up as needed. You can also try adjusting the max_split_size_mb parameter to avoid the CUDA out of memory error. This parameter controls the maximum size (in MB) of each tensor split. You can set it to a smaller value to reduce memory usage, but this may also slow down training.
โ05-16-2023 07:02 AM
@Suteja Kanuriโ -
Hi,
Thanks for responding. I've tried your suggestion however got an error
"
ValueError: The following `model_kwargs` are not used by the model: ['max_split_size_mb'] (note: typos in the generate arguments will also show up in this list)
"
Specially I am testing the demo Databricks provided (https://www.dbdemos.ai/, llm-dolly-chatbot) and I am getting this error in 03-Q&A-prompt-engineering-for-dolly in the build_qa_chain() function when pipeline was called.
Thoughts?
Additional info:
โ05-16-2023 08:46 AM
@Suteja Kanuriโ Update - I was able to get it to work by upgrading to a g4dn.12xlarge node (4 gpus).
However, the code in 02-Data-preparation to apply sshleifer/distilbart-cnn-12-6 model for a summarization task failed with the more powerful node (while it worked fine with just a single GPU). Do you have any suggestions there?
I set repartition to 4 since there were 4 GPUs. docs_limit_df has 4 rows.
torch.cuda.empty_cache()
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", device_map="auto")
docs_limit_df = docs_limit_df.repartition(4).withColumn("text_short", summarize_all("text"))
The error I got was
"PythonException: 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!', from <command-434574176370212>, line 8. Full traceback below:"
โ04-15-2023 10:25 PM
Hi @H Tโ
Hope everything is going great.
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.
Cheers!
โ06-02-2023 11:48 AM
Just set the HF cache dir to a persistent path on /dbfs:
import os
os.environ['TRANSFORMERS_CACHE'] = "/dbfs/..."
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group