cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Tool Calls with Workspace Models

thmonte
New Contributor II

I recently followed the blog post on running deepseek llama distilled.  I then served it via Serving Endpoints with provisioned throughput.  In my use case I am using pydantic-ai to build out some simple agents for testing.  It seems with this style of deployment I'm unable to have the agent make multiple tool calls.  Once the llm responds with an 'assistant' role if I pass the full message history back in with the response from that tool call then I get the following error:

Model does not support continuing the chat past the first tool call

I believe this has to do with the way the serving endpoints are being configured when using 'llm/v1/chat' but I could be wrong.

Is a way around this to build out the inference configuration manually?  Will I lose any functionality?

The only models this currently works on is the foundational models that support Function calling.  ex: databricks-meta-llama-3-3-70b-instruct.

Any guidance here would be great!

2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @thmonte,

You can define the model signature, including input and output parameters, to ensure that the model can handle the required interactions. This involves specifying parameters such as temperature, max_tokens, stop, and other relevant settings. Make sure that your endpoint is configured with the appropriate provisioned throughput settings to handle the expected load and interactions.

Here's an example: 

from mlflow.models import infer_signature
import mlflow

# Define model signature including params
input_example = {"prompt": "What is Machine Learning?"}
inference_config = {
"temperature": 1.0,
"max_new_tokens": 100,
"do_sample": True,
"repetition_penalty": 1.15, # Custom parameter example
}
signature = infer_signature(
model_input=input_example,
model_output="Machine Learning is...",
params=inference_config
)

# Log the model with its details such as artifacts, pip requirements, and input example
with mlflow.start_run() as run:
mlflow.transformers.log_model(
transformers_model={"model": model, "tokenizer": tokenizer},
artifact_path="model",
task="llm/v1/chat",
signature=signature,
input_example=input_example,
registered_model_name="custom_llm_model"
)

thmonte
New Contributor II

Thanks @Alberto_Umana 

which one of these controls allows the conversation to continue past the first tool call?  Is there documentation on all configurable fields?  Also does this still allow override some of these at the client level? Ex. passing in temperature when calling the llm?

inference_config = {
"temperature": 1.0,
"max_new_tokens": 100,
"do_sample": True,
"repetition_penalty": 1.15, # Custom parameter example
}



I did deploy the model in a similar way as you described but did not pass in signature and input_example.

task = "llm/v1/chat"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

transformers_model = {"model": model, "tokenizer": tokenizer}


with mlflow.start_run():
   model_info = mlflow.transformers.log_model(
       transformers_model=transformers_model,
       artifact_path="model",
       task=task,
       registered_model_name='model_name',
       metadata={
           "task": task,
           "pretrained_model_name": "meta-llama/Llama-3.3-70B-Instruct",
           "databricks_model_family": "LlamaForCausalLM",
           "databricks_model_size_parameters": "8b",
        },
    )
     



Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group