<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic streaming llm response in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/streaming-llm-response/m-p/122027#M963</link>
    <description>&lt;P&gt;I am deploying an agent that works good withouth streaming:&lt;BR /&gt;&lt;BR /&gt;it is using the following packages:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;mlflow==2.22.1&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langgraph&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langchain&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;pydantic==2.8.2&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langgraph-checkpoint-sqlite&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-langchain&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;pypdf&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-vectorsearch&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langchain_core&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-feature-store&amp;gt;=0.13.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;nest_asyncio&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-sdk==0.50.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-agents==0.20.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;My implementation is based on this link:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agents" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agents&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Inside the notebook works good but after i deploy i get:&lt;/DIV&gt;&lt;DIV&gt;[7gwmf] [2025-06-17 17:21:50 +0000] Encountered an unexpected error while parsing the input data. Error 'This model does not support predict_stream method.'&lt;BR /&gt;[7gwmf] Traceback (most recent call last):&lt;BR /&gt;[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation&lt;BR /&gt;[7gwmf] raise MlflowException("This model does not support predict_stream method.")&lt;BR /&gt;[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.&lt;BR /&gt;[7gwmf]&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;the mlflow page&amp;nbsp;&lt;A href="https://mlflow.org/releases/2.19.0" target="_blank" rel="noopener"&gt;https://mlflow.org/releases/2.19.0&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;it says:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;ChatModel enhancements&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;-&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://mlflow.org/docs/latest/llms/chat-model-guide/index.html" target="_blank" rel="noopener noreferrer"&gt;ChatModel&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;now adopts&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionRequest&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionResponse&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as its new schema. The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;predict_stream&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;interface uses&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionChunk&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to deliver true streaming responses. Additionally, the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;custom_inputs&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;custom_outputs&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;fields in ChatModel now utilize&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;AnyType, enabling support for a wider variety of data types.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Note:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;In a future version of MLflow,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatParams&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(and by extension,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionRequest) will have the default values for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;n,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;temperature, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;stream&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;removed. (&lt;A href="https://github.com/mlflow/mlflow/pull/13782" target="_blank" rel="noopener noreferrer"&gt;#13782&lt;/A&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/mlflow/mlflow/pull/13857" target="_blank" rel="noopener noreferrer"&gt;#13857&lt;/A&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/stevenchen-db" target="_blank" rel="noopener noreferrer"&gt;@stevenchen-db&lt;/A&gt;)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;SPAN&gt;&lt;BR /&gt;What do i need to do to correctly have an implement the streaming for the llm i am working on.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 17 Jun 2025 17:47:46 GMT</pubDate>
    <dc:creator>chunky35</dc:creator>
    <dc:date>2025-06-17T17:47:46Z</dc:date>
    <item>
      <title>streaming llm response</title>
      <link>https://community.databricks.com/t5/generative-ai/streaming-llm-response/m-p/122027#M963</link>
      <description>&lt;P&gt;I am deploying an agent that works good withouth streaming:&lt;BR /&gt;&lt;BR /&gt;it is using the following packages:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;mlflow==2.22.1&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langgraph&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langchain&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;pydantic==2.8.2&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langgraph-checkpoint-sqlite&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-langchain&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;pypdf&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-vectorsearch&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langchain_core&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-feature-store&amp;gt;=0.13.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;nest_asyncio&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-sdk==0.50.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;databricks-agents==0.20.0&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;My implementation is based on this link:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agents" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agents&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Inside the notebook works good but after i deploy i get:&lt;/DIV&gt;&lt;DIV&gt;[7gwmf] [2025-06-17 17:21:50 +0000] Encountered an unexpected error while parsing the input data. Error 'This model does not support predict_stream method.'&lt;BR /&gt;[7gwmf] Traceback (most recent call last):&lt;BR /&gt;[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation&lt;BR /&gt;[7gwmf] raise MlflowException("This model does not support predict_stream method.")&lt;BR /&gt;[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.&lt;BR /&gt;[7gwmf]&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;the mlflow page&amp;nbsp;&lt;A href="https://mlflow.org/releases/2.19.0" target="_blank" rel="noopener"&gt;https://mlflow.org/releases/2.19.0&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;it says:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;ChatModel enhancements&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;-&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://mlflow.org/docs/latest/llms/chat-model-guide/index.html" target="_blank" rel="noopener noreferrer"&gt;ChatModel&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;now adopts&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionRequest&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionResponse&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as its new schema. The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;predict_stream&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;interface uses&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionChunk&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to deliver true streaming responses. Additionally, the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;custom_inputs&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;custom_outputs&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;fields in ChatModel now utilize&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;AnyType, enabling support for a wider variety of data types.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Note:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;In a future version of MLflow,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatParams&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(and by extension,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ChatCompletionRequest) will have the default values for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;n,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;temperature, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;stream&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;removed. (&lt;A href="https://github.com/mlflow/mlflow/pull/13782" target="_blank" rel="noopener noreferrer"&gt;#13782&lt;/A&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/mlflow/mlflow/pull/13857" target="_blank" rel="noopener noreferrer"&gt;#13857&lt;/A&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/stevenchen-db" target="_blank" rel="noopener noreferrer"&gt;@stevenchen-db&lt;/A&gt;)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;SPAN&gt;&lt;BR /&gt;What do i need to do to correctly have an implement the streaming for the llm i am working on.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 17 Jun 2025 17:47:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/streaming-llm-response/m-p/122027#M963</guid>
      <dc:creator>chunky35</dc:creator>
      <dc:date>2025-06-17T17:47:46Z</dc:date>
    </item>
    <item>
      <title>Re: streaming llm response</title>
      <link>https://community.databricks.com/t5/generative-ai/streaming-llm-response/m-p/133457#M1184</link>
      <description>&lt;DIV class="bg-base -mx-md px-md sticky-tabs-ref erp-sidecar:sticky top-0 z-10 md:sticky"&gt;
&lt;DIV class="mx-auto max-w-threadContentWidth"&gt;
&lt;DIV class="relative"&gt;
&lt;DIV data-testid="answer-mode-tabs"&gt;
&lt;DIV class="relative border-b border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;DIV class="-mx-md px-md scrollbar-none -my-px overflow-x-auto overflow-y-hidden py-px"&gt;
&lt;DIV class="table w-full"&gt;
&lt;DIV class="gap-xs relative border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="relative flex gap-1.5"&gt;
&lt;DIV class="md:max-w-threadContentWidth md:mx-auto md:w-full"&gt;
&lt;DIV class="grid grid-cols-1 grid-rows-1 -mx-sm"&gt;
&lt;DIV class="gap-xs relative col-start-1 row-start-1 flex"&gt;
&lt;DIV class="ml-auto"&gt;&lt;SPAN&gt;To implement streaming output for your agent in Databricks and resolve the error&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;"This model does not support predict_stream method."&lt;/CODE&gt;&lt;SPAN&gt;, the key requirement is that your underlying MLflow model must support the&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;predict_stream&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;method. Most likely, your current registered MLflow model is not using a&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;ChatModel&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;implementation or LLM wrapper that supports streaming, so standard&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.predict()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;works but&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.predict_stream()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;does not.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="pb-md mx-auto pt-5 md:pb-12 max-w-threadContentWidth"&gt;
&lt;DIV class="relative"&gt;
&lt;DIV class="gap-y-sm flex flex-col"&gt;
&lt;DIV class="gap-y-lg mt-3 flex flex-col first:mt-0"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;DIV class="gap-y-sm md:gap-y-md flex flex-col"&gt;
&lt;DIV class="relative font-sans text-base text-foreground selection:bg-super/50 selection:text-foreground dark:selection:bg-super/10 dark:selection:text-super"&gt;
&lt;DIV class="min-w-0 break-words [word-break:break-word]"&gt;
&lt;DIV id="markdown-content-0" class="gap-y-md after:clear-both after:block after:content-['']" dir="auto"&gt;
&lt;DIV class="relative"&gt;
&lt;DIV class="prose text-pretty dark:prose-invert inline leading-relaxed break-words min-w-0 [word-break:break-word] prose-strong:font-medium"&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Why This Error Occurs&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Streaming interface&lt;/STRONG&gt;: The MLflow model must implement the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;predict_stream&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method (using MLflow’s LLM/ChatModel interface).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Model registration&lt;/STRONG&gt;: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Correct save&lt;/STRONG&gt;: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;How to Resolve&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;1. Use a Supported ChatModel With Streaming&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;mlflow.langchain.save_model()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or similar, specifying the appropriate class that includes the streaming method.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;2. Implement Streaming in Your Model&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;predict_stream&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method implemented.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In LangChain and LangGraph settings, ensure the LLM object supports streaming (set&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;stream=True&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and use classes/interfaces that yield partial outputs).&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;3. Register and Deploy the Streaming Model&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Save the model using the appropriate MLflow saving function that retains the streaming capabilities.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When registering/deploying, the model artifact must expose&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;predict_stream&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;4. Check Your Deployment Code&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Example: MLflow Streaming ChatModel&lt;/H2&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl sticky top-0 flex h-0 items-start justify-end"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="pr-lg"&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; mlflow
&lt;SPAN class="token token"&gt;from&lt;/SPAN&gt; mlflow&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;langchain &lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; save_model
&lt;SPAN class="token token"&gt;from&lt;/SPAN&gt; langchain&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;chat_models &lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; ChatOpenAI

&lt;SPAN class="token token"&gt;# Setup your LLM with streaming enabled&lt;/SPAN&gt;
llm &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; ChatOpenAI&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;temperature&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;0.1&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; streaming&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token boolean"&gt;True&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="token token"&gt;# Save model using MLflow&lt;/SPAN&gt;
save_model&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;
    llm&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    path&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"llm_model_streaming"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    mlflow_model_flavor&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"langchain"&lt;/SPAN&gt;
&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Ensure the ChatModel (&lt;CODE&gt;ChatOpenAI&lt;/CODE&gt;, MosaicML, etc.) supports streaming out of the box and is saved with that capability.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;References to the Official Docs&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The official [Databricks agent streaming guide], and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="hover:text-super hover:decoration-super break-words underline decoration-from-font underline-offset-1 transition-all duration-300" href="https://mlflow.org/docs/latest/llms/chat-model-guide/index.html" target="_blank" rel="nofollow noopener"&gt;MLflow ChatModel/Streaming&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2 id="key-steps-to-fix" class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0 md:text-lg [hr+&amp;amp;]:mt-4"&gt;Key Steps to Fix&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Verify that your saving function in MLflow (e.g.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;save_model()&lt;/CODE&gt;) saves a streaming-capable ChatModel.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Update deployment code or configs to use the streaming endpoint (&lt;CODE&gt;predict_stream&lt;/CODE&gt;).&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy&lt;/STRONG&gt;.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Table: Error Cause and Resolution&lt;/H2&gt;
&lt;DIV class="group relative"&gt;
&lt;DIV class="w-full overflow-x-auto md:max-w-[90vw] border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;TABLE class="border-subtler my-[1em] w-full table-auto border-separate border-spacing-0 border-l border-t"&gt;
&lt;THEAD class="bg-subtler"&gt;
&lt;TR&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Cause&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Resolution&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Model lacks&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;predict_stream&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Save with streaming ChatModel&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Wrong MLflow save function or model class&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;mlflow.langchain.save_model&lt;/CODE&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;LLM streaming not enabled in config&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Set&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;stream=True&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in LLM params&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;DIV class="bg-base border-subtler shadow-subtle pointer-coarse:opacity-100 right-xs absolute bottom-0 flex rounded-lg border opacity-0 transition-opacity group-hover:opacity-100 [&amp;amp;&amp;gt;*:not(:first-child)]:border-subtle [&amp;amp;&amp;gt;*:not(:first-child)]:border-l"&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;HR /&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 01 Oct 2025 13:42:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/streaming-llm-response/m-p/133457#M1184</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-01T13:42:57Z</dc:date>
    </item>
  </channel>
</rss>

