Hello @thmonte,
You can define the model signature, including input and output parameters, to ensure that the model can handle the required interactions. This involves specifying parameters such as temperature
, max_tokens
, stop
, and other relevant settings. Make sure that your endpoint is configured with the appropriate provisioned throughput settings to handle the expected load and interactions.
Here's an example:
from mlflow.models import infer_signature
import mlflow
# Define model signature including params
input_example = {"prompt": "What is Machine Learning?"}
inference_config = {
"temperature": 1.0,
"max_new_tokens": 100,
"do_sample": True,
"repetition_penalty": 1.15, # Custom parameter example
}
signature = infer_signature(
model_input=input_example,
model_output="Machine Learning is...",
params=inference_config
)
# Log the model with its details such as artifacts, pip requirements, and input example
with mlflow.start_run() as run:
mlflow.transformers.log_model(
transformers_model={"model": model, "tokenizer": tokenizer},
artifact_path="model",
task="llm/v1/chat",
signature=signature,
input_example=input_example,
registered_model_name="custom_llm_model"
)