โ07-16-2025 03:01 AM
Hello Databricks Community,
I am Prakash Hinduja from Geneva, Switzerland (Swiss), currently exploring fine-tuning large language models (LLMs) in Databricks and would appreciate any guidance or suggestions from those with experience in this area.
Regards
Prakash Hinduja Geneva, Switzerland (Swiss)
โ07-16-2025 03:45 AM
Hello Prakash
You can start it from here: https://www.databricks.com/blog/fine-tuning-large-language-models-hugging-face-and-deepspeed
Also there is ongoing databricks courses
https://uplimit.com/course/databricks-genai-with-databricks you can register here.
โ07-16-2025 03:45 AM
Hello Prakash
You can start it from here: https://www.databricks.com/blog/fine-tuning-large-language-models-hugging-face-and-deepspeed
Also there is ongoing databricks courses
https://uplimit.com/course/databricks-genai-with-databricks you can register here.
โ07-16-2025 07:26 AM
Hello @prakashhinduja
can you acknowledge the solution provided because you keep on asking the same questions. ๐
Tuesday
@Khaja_Zaffer Thanks you for providing those links. But the materials on those links are outdated and difficult to replicate. It gives me the impression that LLM fine-tuning is no longer the priority of Databricks Mosaic AI. Any expert can confirm that?
Tuesday
Hi @jayshan, you're right the material is outdated now. The more current path is Foundation Model Fine-tuning, which is part of Mosaic AI Model Training. You can run it from the Databricks UI or with the databricks_genai SDK, point it at training data in Unity Catalog, pick a task type like CHAT_COMPLETION, INSTRUCTION_FINETUNE, or CONTINUED_PRETRAIN, and it will register the fine-tuned model back to Unity Catalog for serving. Just note it's still in Public Preview and limited to certain regions. Here is the official doc: https://docs.databricks.com/aws/en/large-language-models/foundation-model-training/
If you need more control than the managed API offers, you can also spin up a GPU cluster on Databricks AI Runtime and use libraries like LoRA, TRL, DeepSpeed, Unsloth, or Axolotl for custom workflows
Wednesday
Thanks, @anmolhhns I tried those options too. Unfortunately, they seem outdated too. I created a workspace in East-1, but didn't the option for Foundation Model Fine-tuning in the Experiments interface. I tried to run the codes in https://docs.databricks.com/aws/en/large-language-models/foundation-model-training/. The referenced models are no longer available. Any updated codebook will be very helpful!
Wednesday
Hi @jayshan , you're right that it doesn't show up in the Experiments tab. It's accessed through the databricks_genai SDK directly from a notebook. The Experiments tab will only show your run after you launch one via the SDK.
I just ran it in my workspace and it works, here's a quick setup you can try:
%pip install databricks_genai dbutils.library.restartPython()
from databricks.model_training import foundation_model as fm
# 1. First, check what models are actually available in your region/workspace
for m in fm.get_models():
print(m.name)In my workspace this returned: Llama 3.1 8B / 8B-Instruct / 70B / 70B-Instruct, Llama 3.2 1B / 1B-Instruct / 3B / 3B-Instruct, and Llama 3.3 70B-Instruct. So meta-llama/Llama-3.2-3B-Instruct from the docs example is still valid, if you got "model not available" earlier, it might be a region issue or the SDK version.
# 2. Launch a small training run (point train_data_path to a JSONL in a Unity Catalog Volume)
run = fm.create(
model="meta-llama/Llama-3.2-3B-Instruct",
train_data_path="/Volumes/<your_catalog>/<schema>/<volume>/train.jsonl",
task_type="CHAT_COMPLETION",
register_to="<your_catalog>.<schema>.<model_name>",
training_duration="1ep",
learning_rate="5e-7",
)
print(run.name, run.status)The training data should be a JSONL file with chat-format rows like:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}Once you call fm.create(), the run will then show up in the Experiments tab as an MLflow run. If you still hit "model not available" after running get_models(), share the exact error and we can dig into it.