- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2025 10:47 AM
I am deploying an agent that works good withouth streaming:
it is using the following packages:
https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agent...
Inside the notebook works good but after i deploy i get:
[7gwmf] Traceback (most recent call last):
[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation
[7gwmf] raise MlflowException("This model does not support predict_stream method.")
[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.
[7gwmf]
it says:
ChatModel enhancements - ChatModel now adopts ChatCompletionRequest and ChatCompletionResponse as its new schema. The predict_stream interface uses ChatCompletionChunk to deliver true streaming responses. Additionally, the custom_inputs and custom_outputs fields in ChatModel now utilize AnyType, enabling support for a wider variety of data types. Note: In a future version of MLflow, ChatParams (and by extension, ChatCompletionRequest) will have the default values for n, temperature, and stream removed. (#13782, #13857, @stevenchen-db)
What do i need to do to correctly have an implement the streaming for the llm i am working on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2025 06:42 AM
"This model does not support predict_stream method.", the key requirement is that your underlying MLflow model must support the predict_stream method. Most likely, your current registered MLflow model is not using a ChatModel implementation or LLM wrapper that supports streaming, so standard .predict() works but .predict_stream() does not.Why This Error Occurs
-
Streaming interface: The MLflow model must implement the
predict_streammethod (using MLflow’s LLM/ChatModel interface). -
Model registration: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.
-
Correct save: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.
How to Resolve
1. Use a Supported ChatModel With Streaming
Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use mlflow.langchain.save_model() or similar, specifying the appropriate class that includes the streaming method.
2. Implement Streaming in Your Model
-
Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a
predict_streammethod implemented. -
In LangChain and LangGraph settings, ensure the LLM object supports streaming (set
stream=Trueand use classes/interfaces that yield partial outputs).
3. Register and Deploy the Streaming Model
-
Save the model using the appropriate MLflow saving function that retains the streaming capabilities.
-
When registering/deploying, the model artifact must expose
predict_stream.
4. Check Your Deployment Code
When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.
Example: MLflow Streaming ChatModel
import mlflow
from mlflow.langchain import save_model
from langchain.chat_models import ChatOpenAI
# Setup your LLM with streaming enabled
llm = ChatOpenAI(temperature=0.1, streaming=True)
# Save model using MLflow
save_model(
llm,
path="llm_model_streaming",
mlflow_model_flavor="langchain"
)
-
Ensure the ChatModel (
ChatOpenAI, MosaicML, etc.) supports streaming out of the box and is saved with that capability.
References to the Official Docs
The official [Databricks agent streaming guide], and MLflow ChatModel/Streaming documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.
Key Steps to Fix
-
Verify that your saving function in MLflow (e.g.
save_model()) saves a streaming-capable ChatModel. -
Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.
-
Update deployment code or configs to use the streaming endpoint (
predict_stream).
If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy.
Table: Error Cause and Resolution
| Cause | Resolution |
|---|---|
Model lacks predict_stream method |
Save with streaming ChatModel |
| Wrong MLflow save function or model class | Use mlflow.langchain.save_model |
| LLM streaming not enabled in config | Set stream=True in LLM params |
Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks.