cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Prakash Hinduja Geneva, Switzerland, How do I fine-tune a large language model (LLM) in Databricks?

prakashhinduja
New Contributor III

Hello Databricks Community,

I am Prakash Hinduja from Geneva, Switzerland (Swiss), currently exploring fine-tuning large language models (LLMs) in Databricks and would appreciate any guidance or suggestions from those with experience in this area.

 

Regards

Prakash Hinduja Geneva, Switzerland (Swiss) 

1 ACCEPTED SOLUTION

Accepted Solutions

Khaja_Zaffer
Esteemed Contributor
6 REPLIES 6

Khaja_Zaffer
Esteemed Contributor

Hello Prakash 

You can start it from here: https://www.databricks.com/blog/fine-tuning-large-language-models-hugging-face-and-deepspeed

Also there is ongoing databricks courses 

https://uplimit.com/course/databricks-genai-with-databricks you can register here. 

Khaja_Zaffer
Esteemed Contributor

Hello @prakashhinduja 

can you acknowledge the solution provided because you keep on asking the same questions. ๐Ÿ™‚

jayshan
New Contributor III

@Khaja_Zaffer Thanks you for providing those links. But the materials on those links are outdated and difficult to replicate. It gives me the impression that LLM fine-tuning is no longer the priority of Databricks Mosaic AI. Any expert can confirm that?

anmolhhns
New Contributor

Hi @jayshan, you're right the material is outdated now. The more current path is Foundation Model Fine-tuning, which is part of Mosaic AI Model Training. You can run it from the Databricks UI or with the databricks_genai SDK, point it at training data in Unity Catalog, pick a task type like CHAT_COMPLETION, INSTRUCTION_FINETUNE, or CONTINUED_PRETRAIN, and it will register the fine-tuned model back to Unity Catalog for serving. Just note it's still in Public Preview and limited to certain regions. Here is the official doc: https://docs.databricks.com/aws/en/large-language-models/foundation-model-training/
If you need more control than the managed API offers, you can also spin up a GPU cluster on Databricks AI Runtime and use libraries like LoRA, TRL, DeepSpeed, Unsloth, or Axolotl for custom workflows

jayshan
New Contributor III

Thanks, @anmolhhns I tried those options too. Unfortunately, they seem outdated too. I created a workspace in East-1, but didn't the option for Foundation Model Fine-tuning in the Experiments interface. I tried to run the codes in https://docs.databricks.com/aws/en/large-language-models/foundation-model-training/. The referenced models are no longer available. Any updated codebook will be very helpful! 

Hi @jayshan , you're right that it doesn't show up in the Experiments tab. It's accessed through the databricks_genai SDK directly from a notebook. The Experiments tab will only show your run after you launch one via the SDK.

I just ran it in my workspace and it works, here's a quick setup you can try:

%pip install databricks_genai
dbutils.library.restartPython()
from databricks.model_training import foundation_model as fm

# 1. First, check what models are actually available in your region/workspace
for m in fm.get_models():
    print(m.name)

In my workspace this returned: Llama 3.1 8B / 8B-Instruct / 70B / 70B-Instruct, Llama 3.2 1B / 1B-Instruct / 3B / 3B-Instruct, and Llama 3.3 70B-Instruct. So meta-llama/Llama-3.2-3B-Instruct from the docs example is still valid, if you got "model not available" earlier, it might be a region issue or the SDK version.

# 2. Launch a small training run (point train_data_path to a JSONL in a Unity Catalog Volume)
run = fm.create(
    model="meta-llama/Llama-3.2-3B-Instruct",
    train_data_path="/Volumes/<your_catalog>/<schema>/<volume>/train.jsonl",
    task_type="CHAT_COMPLETION",
    register_to="<your_catalog>.<schema>.<model_name>",
    training_duration="1ep",
    learning_rate="5e-7",
)

print(run.name, run.status)

The training data should be a JSONL file with chat-format rows like:

{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Once you call fm.create(), the run will then show up in the Experiments tab as an MLflow run. If you still hit "model not available" after running get_models(), share the exact error and we can dig into it.