thmonte
New Contributor II

Thanks @Alberto_Umana 

which one of these controls allows the conversation to continue past the first tool call?  Is there documentation on all configurable fields?  Also does this still allow override some of these at the client level? Ex. passing in temperature when calling the llm?

inference_config = {
"temperature": 1.0,
"max_new_tokens": 100,
"do_sample": True,
"repetition_penalty": 1.15, # Custom parameter example
}



I did deploy the model in a similar way as you described but did not pass in signature and input_example.

task = "llm/v1/chat"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

transformers_model = {"model": model, "tokenizer": tokenizer}


with mlflow.start_run():
   model_info = mlflow.transformers.log_model(
       transformers_model=transformers_model,
       artifact_path="model",
       task=task,
       registered_model_name='model_name',
       metadata={
           "task": task,
           "pretrained_model_name": "meta-llama/Llama-3.3-70B-Instruct",
           "databricks_model_family": "LlamaForCausalLM",
           "databricks_model_size_parameters": "8b",
        },
    )