cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Deploying HuggingFace LLM model with MLflow task llm/v1/chat into Databricks

albert_herrando
New Contributor

Hello,

I am currently trying to deploy a HuggingFace LLM model to Databricks with the MLflow task llm/v1/chat in order to use it as a chat.

I have tried several models like:

However, once deployed, the models act very weirdly:

albert_herrando_0-1747401869742.png

albert_herrando_1-1747401929720.png

The code that I am using to log the models into Unity Catalog is the following:

%pip install transformers
%pip install torch
%pip install accelerate
%pip install torchvision
dbutils.library.restartPython()

import mlflow
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import transformers
import torch
from huggingface_hub import ModelCard

model_id = "TinyLlama/TinyLlama_v1.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

transformers_model = {"model": model, "tokenizer": tokenizer}

# The signature will be automatically inferred using the input_example by MLflow for llm/v1/chat
input_example = {
    "messages": [
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    # These are optional parameters for the llm/v1/chat endpoint
    # "temperature": 0.6,
    # "max_tokens": 300
}

# --- Unity Catalog Setup ---
# Make sure the catalog and schema exist in Unity Catalog
uc_catalog = "dts_proves_pre"
uc_schema = "llms"
registered_model_name = f"{uc_catalog}.{uc_schema}.TinyLlama_v1-1"

# Configure MLflow to use Unity Catalog
mlflow.set_registry_uri("databricks-uc")

# Log model
with mlflow.start_run():
  model_info = mlflow.transformers.log_model(
      transformers_model=transformers_model,
      task = "llm/v1/chat",
      model_card = ModelCard.load(model_id),
      artifact_path="TinyLlama_v1.1-model",
      # signature=signature, # The signature will be automatically inferred using the input_example by MLflow for llm/v1/chat
      input_example=input_example,
      registered_model_name=registered_model_name,
      extra_pip_requirements=["transformers", "torch", "torchvision", "accelerate"]
  )

 

I am encountering this problem with several LLMs from HuggingFace. It seems that there is a mismatch when the prompt is generated or the chat template is not properly applied.

Does anyone know what is happening or how to solve it?

Thank you very much in advance.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now