Databricks Community

chunky35 · ‎06-17-2025

I am deploying an agent that works good withouth streaming:

it is using the following packages:

"mlflow==2.22.1",

"langgraph",

"langchain",

"pydantic==2.8.2",

"langgraph-checkpoint-sqlite",

"databricks-langchain",

"pypdf",

"databricks-vectorsearch",

"langchain_core",

"databricks-feature-store>=0.13.0",

"nest_asyncio",

"databricks-sdk==0.50.0",

"databricks-agents==0.20.0"

My implementation is based on this link:
https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agent...

Inside the notebook works good but after i deploy i get:

[7gwmf] [2025-06-17 17:21:50 +0000] Encountered an unexpected error while parsing the input data. Error 'This model does not support predict_stream method.'
[7gwmf] Traceback (most recent call last):
[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation
[7gwmf] raise MlflowException("This model does not support predict_stream method.")
[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.
[7gwmf]

the mlflow page https://mlflow.org/releases/2.19.0

it says:

ChatModel enhancements - ChatModel now adopts ChatCompletionRequest and ChatCompletionResponse as its new schema. The predict_stream interface uses ChatCompletionChunk to deliver true streaming responses. Additionally, the custom_inputs and custom_outputs fields in ChatModel now utilize AnyType, enabling support for a wider variety of data types. Note: In a future version of MLflow, ChatParams (and by extension, ChatCompletionRequest) will have the default values for n, temperature, and stream removed. (#13782, #13857, @stevenchen-db)

What do i need to do to correctly have an implement the streaming for the llm i am working on.

mark_ott · ‎10-01-2025

To implement streaming output for your agent in Databricks and resolve the error "This model does not support predict_stream method.", the key requirement is that your underlying MLflow model must support the predict_stream method. Most likely, your current registered MLflow model is not using a ChatModel implementation or LLM wrapper that supports streaming, so standard .predict() works but .predict_stream() does not.

Why This Error Occurs

Streaming interface: The MLflow model must implement the predict_stream method (using MLflow’s LLM/ChatModel interface).
Model registration: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.
Correct save: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.

How to Resolve

1. Use a Supported ChatModel With Streaming

Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use mlflow.langchain.save_model() or similar, specifying the appropriate class that includes the streaming method.

2. Implement Streaming in Your Model

Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a predict_stream method implemented.
In LangChain and LangGraph settings, ensure the LLM object supports streaming (set stream=True and use classes/interfaces that yield partial outputs).

3. Register and Deploy the Streaming Model

Save the model using the appropriate MLflow saving function that retains the streaming capabilities.
When registering/deploying, the model artifact must expose predict_stream.

4. Check Your Deployment Code

When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.

Example: MLflow Streaming ChatModel

python

import mlflow
from mlflow.langchain import save_model
from langchain.chat_models import ChatOpenAI

# Setup your LLM with streaming enabled
llm = ChatOpenAI(temperature=0.1, streaming=True)

# Save model using MLflow
save_model(
    llm,
    path="llm_model_streaming",
    mlflow_model_flavor="langchain"
)

Ensure the ChatModel (ChatOpenAI, MosaicML, etc.) supports streaming out of the box and is saved with that capability.

References to the Official Docs

The official [Databricks agent streaming guide], and MLflow ChatModel/Streaming documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.

Key Steps to Fix

Verify that your saving function in MLflow (e.g. save_model()) saves a streaming-capable ChatModel.
Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.
Update deployment code or configs to use the streaming endpoint (predict_stream).

If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy.

Table: Error Cause and Resolution

Cause	Resolution
Model lacks `predict_stream` method	Save with streaming ChatModel
Wrong MLflow save function or model class	Use `mlflow.langchain.save_model`
LLM streaming not enabled in config	Set `stream=True` in LLM params

Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks.

View solution in original post

mark_ott · ‎10-01-2025

To implement streaming output for your agent in Databricks and resolve the error "This model does not support predict_stream method.", the key requirement is that your underlying MLflow model must support the predict_stream method. Most likely, your current registered MLflow model is not using a ChatModel implementation or LLM wrapper that supports streaming, so standard .predict() works but .predict_stream() does not.

Why This Error Occurs

Streaming interface: The MLflow model must implement the predict_stream method (using MLflow’s LLM/ChatModel interface).
Model registration: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.
Correct save: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.

How to Resolve

1. Use a Supported ChatModel With Streaming

Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use mlflow.langchain.save_model() or similar, specifying the appropriate class that includes the streaming method.

2. Implement Streaming in Your Model

Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a predict_stream method implemented.
In LangChain and LangGraph settings, ensure the LLM object supports streaming (set stream=True and use classes/interfaces that yield partial outputs).

3. Register and Deploy the Streaming Model

Save the model using the appropriate MLflow saving function that retains the streaming capabilities.
When registering/deploying, the model artifact must expose predict_stream.

4. Check Your Deployment Code

When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.

Example: MLflow Streaming ChatModel

python

import mlflow
from mlflow.langchain import save_model
from langchain.chat_models import ChatOpenAI

# Setup your LLM with streaming enabled
llm = ChatOpenAI(temperature=0.1, streaming=True)

# Save model using MLflow
save_model(
    llm,
    path="llm_model_streaming",
    mlflow_model_flavor="langchain"
)

Ensure the ChatModel (ChatOpenAI, MosaicML, etc.) supports streaming out of the box and is saved with that capability.

References to the Official Docs

The official [Databricks agent streaming guide], and MLflow ChatModel/Streaming documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.

Key Steps to Fix

Verify that your saving function in MLflow (e.g. save_model()) saves a streaming-capable ChatModel.
Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.
Update deployment code or configs to use the streaming endpoint (predict_stream).

If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy.

Table: Error Cause and Resolution

Cause	Resolution
Model lacks `predict_stream` method	Save with streaming ChatModel
Wrong MLflow save function or model class	Use `mlflow.langchain.save_model`
LLM streaming not enabled in config	Set `stream=True` in LLM params

Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks.

Databricks Community

streaming llm response

Why This Error Occurs

How to Resolve

1. Use a Supported ChatModel With Streaming

2. Implement Streaming in Your Model

3. Register and Deploy the Streaming Model

4. Check Your Deployment Code

Example: MLflow Streaming ChatModel

References to the Official Docs

Key Steps to Fix

Table: Error Cause and Resolution

Why This Error Occurs

How to Resolve

1. Use a Supported ChatModel With Streaming

2. Implement Streaming in Your Model

3. Register and Deploy the Streaming Model

4. Check Your Deployment Code

Example: MLflow Streaming ChatModel

References to the Official Docs

Key Steps to Fix

Table: Error Cause and Resolution

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples