<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Tool Calls with Workspace Models in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108598#M737</link>
    <description>&lt;P&gt;I recently followed the blog post on running deepseek llama distilled.&amp;nbsp; I then served it via Serving Endpoints with provisioned throughput.&amp;nbsp; In my use case I am using pydantic-ai to build out some simple agents for testing.&amp;nbsp; It seems with this style of deployment I'm unable to have the agent make multiple tool calls.&amp;nbsp; Once the llm responds with an 'assistant' role if I pass the full message history back in with the response from that tool call then I get the following error:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Model does not support continuing the chat past the first tool call&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;I believe this has to do with the way the serving endpoints are being configured when using 'llm/v1/chat' but I could be wrong.&lt;BR /&gt;&lt;BR /&gt;Is a way around this to build out the inference configuration manually?&amp;nbsp; Will I lose any functionality?&lt;BR /&gt;&lt;BR /&gt;The only models this currently works on is the foundational models that support Function calling.&amp;nbsp; ex:&amp;nbsp;databricks-meta-llama-3-3-70b-instruct.&lt;BR /&gt;&lt;BR /&gt;Any guidance here would be great!&lt;/P&gt;</description>
    <pubDate>Mon, 03 Feb 2025 15:02:21 GMT</pubDate>
    <dc:creator>thmonte</dc:creator>
    <dc:date>2025-02-03T15:02:21Z</dc:date>
    <item>
      <title>Tool Calls with Workspace Models</title>
      <link>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108598#M737</link>
      <description>&lt;P&gt;I recently followed the blog post on running deepseek llama distilled.&amp;nbsp; I then served it via Serving Endpoints with provisioned throughput.&amp;nbsp; In my use case I am using pydantic-ai to build out some simple agents for testing.&amp;nbsp; It seems with this style of deployment I'm unable to have the agent make multiple tool calls.&amp;nbsp; Once the llm responds with an 'assistant' role if I pass the full message history back in with the response from that tool call then I get the following error:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Model does not support continuing the chat past the first tool call&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;I believe this has to do with the way the serving endpoints are being configured when using 'llm/v1/chat' but I could be wrong.&lt;BR /&gt;&lt;BR /&gt;Is a way around this to build out the inference configuration manually?&amp;nbsp; Will I lose any functionality?&lt;BR /&gt;&lt;BR /&gt;The only models this currently works on is the foundational models that support Function calling.&amp;nbsp; ex:&amp;nbsp;databricks-meta-llama-3-3-70b-instruct.&lt;BR /&gt;&lt;BR /&gt;Any guidance here would be great!&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 15:02:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108598#M737</guid>
      <dc:creator>thmonte</dc:creator>
      <dc:date>2025-02-03T15:02:21Z</dc:date>
    </item>
    <item>
      <title>Re: Tool Calls with Workspace Models</title>
      <link>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108608#M738</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/147424"&gt;@thmonte&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;You can define the model signature, including input and output parameters, to ensure that the model can handle the required interactions. This involves specifying parameters such as &lt;CODE&gt;temperature&lt;/CODE&gt;, &lt;CODE&gt;max_tokens&lt;/CODE&gt;, &lt;CODE&gt;stop&lt;/CODE&gt;, and other relevant settings.&amp;nbsp;Make sure that your endpoint is configured with the appropriate provisioned throughput settings to handle the expected load and interactions.&lt;/P&gt;
&lt;P&gt;Here's an example:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;from mlflow.models import infer_signature&lt;BR /&gt;import mlflow&lt;/P&gt;
&lt;P&gt;# Define model signature including params&lt;BR /&gt;input_example = {"prompt": "What is Machine Learning?"}&lt;BR /&gt;inference_config = {&lt;BR /&gt;"temperature": 1.0,&lt;BR /&gt;"max_new_tokens": 100,&lt;BR /&gt;"do_sample": True,&lt;BR /&gt;"repetition_penalty": 1.15, # Custom parameter example&lt;BR /&gt;}&lt;BR /&gt;signature = infer_signature(&lt;BR /&gt;model_input=input_example,&lt;BR /&gt;model_output="Machine Learning is...",&lt;BR /&gt;params=inference_config&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Log the model with its details such as artifacts, pip requirements, and input example&lt;BR /&gt;with mlflow.start_run() as run:&lt;BR /&gt;mlflow.transformers.log_model(&lt;BR /&gt;transformers_model={"model": model, "tokenizer": tokenizer},&lt;BR /&gt;artifact_path="model",&lt;BR /&gt;task="llm/v1/chat",&lt;BR /&gt;signature=signature,&lt;BR /&gt;input_example=input_example,&lt;BR /&gt;registered_model_name="custom_llm_model"&lt;BR /&gt;)&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 15:57:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108608#M738</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-03T15:57:11Z</dc:date>
    </item>
    <item>
      <title>Re: Tool Calls with Workspace Models</title>
      <link>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108636#M739</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;which one of these controls allows the conversation to continue past the first tool call?&amp;nbsp; Is there documentation on all configurable fields?&amp;nbsp; Also does this still allow override some of these at the client level? Ex. passing in temperature when calling the llm?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;inference_config = {
"temperature": 1.0,
"max_new_tokens": 100,
"do_sample": True,
"repetition_penalty": 1.15, # Custom parameter example
}&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;I did deploy the model in a similar way as you described but did not pass in signature and input_example.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;task = "llm/v1/chat"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

transformers_model = {"model": model, "tokenizer": tokenizer}


with mlflow.start_run():
   model_info = mlflow.transformers.log_model(
       transformers_model=transformers_model,
       artifact_path="model",
       task=task,
       registered_model_name='model_name',
       metadata={
           "task": task,
           "pretrained_model_name": "meta-llama/Llama-3.3-70B-Instruct",
           "databricks_model_family": "LlamaForCausalLM",
           "databricks_model_size_parameters": "8b",
        },
    )
     &lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 17:36:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/tool-calls-with-workspace-models/m-p/108636#M739</guid>
      <dc:creator>thmonte</dc:creator>
      <dc:date>2025-02-03T17:36:45Z</dc:date>
    </item>
  </channel>
</rss>

