cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

streaming llm response

chunky35
New Contributor

I am deploying an agent that works good withouth streaming:

it is using the following packages:

      "mlflow==2.22.1",
      "langgraph",
      "langchain",
      "pydantic==2.8.2",
      "langgraph-checkpoint-sqlite",
      "databricks-langchain",
      "pypdf",
      "databricks-vectorsearch",
      "langchain_core",
      "databricks-feature-store>=0.13.0",
      "nest_asyncio",
      "databricks-sdk==0.50.0",
      "databricks-agents==0.20.0"
My implementation is based on this link:
https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent#streaming-output-agent...

Inside the notebook works good but after i deploy i get:
[7gwmf] [2025-06-17 17:21:50 +0000] Encountered an unexpected error while parsing the input data. Error 'This model does not support predict_stream method.'
[7gwmf] Traceback (most recent call last):
[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation
[7gwmf] raise MlflowException("This model does not support predict_stream method.")
[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.
[7gwmf]
 
the mlflow page https://mlflow.org/releases/2.19.0

it says:
  • ChatModel enhancements - ChatModel now adopts ChatCompletionRequest and ChatCompletionResponse as its new schema. The predict_stream interface uses ChatCompletionChunk to deliver true streaming responses. Additionally, the custom_inputs and custom_outputs fields in ChatModel now utilize AnyType, enabling support for a wider variety of data types. Note: In a future version of MLflow, ChatParams (and by extension, ChatCompletionRequest) will have the default values for n, temperature, and stream removed. (#13782, #13857, @stevenchen-db)


What do i need to do to correctly have an implement the streaming for the llm i am working on.
1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee
To implement streaming output for your agent in Databricks and resolve the error "This model does not support predict_stream method.", the key requirement is that your underlying MLflow model must support the predict_stream method. Most likely, your current registered MLflow model is not using a ChatModel implementation or LLM wrapper that supports streaming, so standard .predict() works but .predict_stream() does not.

Why This Error Occurs

  • Streaming interface: The MLflow model must implement the predict_stream method (using MLflow’s LLM/ChatModel interface).

  • Model registration: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.

  • Correct save: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.

How to Resolve

1. Use a Supported ChatModel With Streaming

Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use mlflow.langchain.save_model() or similar, specifying the appropriate class that includes the streaming method.

2. Implement Streaming in Your Model

  • Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a predict_stream method implemented.

  • In LangChain and LangGraph settings, ensure the LLM object supports streaming (set stream=True and use classes/interfaces that yield partial outputs).

3. Register and Deploy the Streaming Model

  • Save the model using the appropriate MLflow saving function that retains the streaming capabilities.

  • When registering/deploying, the model artifact must expose predict_stream.

4. Check Your Deployment Code

When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.

Example: MLflow Streaming ChatModel

python
import mlflow from mlflow.langchain import save_model from langchain.chat_models import ChatOpenAI # Setup your LLM with streaming enabled llm = ChatOpenAI(temperature=0.1, streaming=True) # Save model using MLflow save_model( llm, path="llm_model_streaming", mlflow_model_flavor="langchain" )
  • Ensure the ChatModel (ChatOpenAI, MosaicML, etc.) supports streaming out of the box and is saved with that capability.

References to the Official Docs

The official [Databricks agent streaming guide], and MLflow ChatModel/Streaming documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.


Key Steps to Fix

  • Verify that your saving function in MLflow (e.g. save_model()) saves a streaming-capable ChatModel.

  • Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.

  • Update deployment code or configs to use the streaming endpoint (predict_stream).

If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy.


Table: Error Cause and Resolution

Cause Resolution
Model lacks predict_stream method Save with streaming ChatModel
Wrong MLflow save function or model class Use mlflow.langchain.save_model
LLM streaming not enabled in config Set stream=True in LLM params
 
 

Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks.

View solution in original post

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee
To implement streaming output for your agent in Databricks and resolve the error "This model does not support predict_stream method.", the key requirement is that your underlying MLflow model must support the predict_stream method. Most likely, your current registered MLflow model is not using a ChatModel implementation or LLM wrapper that supports streaming, so standard .predict() works but .predict_stream() does not.

Why This Error Occurs

  • Streaming interface: The MLflow model must implement the predict_stream method (using MLflow’s LLM/ChatModel interface).

  • Model registration: If you saved your model with MLflow but did not use an LLM/ChatModel wrapper that supports streaming, only standard prediction will work; streaming will fail.

  • Correct save: The model in MLflow must be saved using a method/class that exposes the streaming endpoint, not just the standard predict endpoint.

How to Resolve

1. Use a Supported ChatModel With Streaming

Ensure you are using an MLflow ChatModel implementation that supports streaming, e.g. OpenAI, Databricks MosaicML, or similar. When saving the model, use mlflow.langchain.save_model() or similar, specifying the appropriate class that includes the streaming method.

2. Implement Streaming in Your Model

  • Your ChatModel class (or whichever class is wrapped for MLflow model serving) should have a predict_stream method implemented.

  • In LangChain and LangGraph settings, ensure the LLM object supports streaming (set stream=True and use classes/interfaces that yield partial outputs).

3. Register and Deploy the Streaming Model

  • Save the model using the appropriate MLflow saving function that retains the streaming capabilities.

  • When registering/deploying, the model artifact must expose predict_stream.

4. Check Your Deployment Code

When deploying the agent, ensure your inference endpoint is properly configured to use the streaming schema per the latest MLflow documentation.

Example: MLflow Streaming ChatModel

python
import mlflow from mlflow.langchain import save_model from langchain.chat_models import ChatOpenAI # Setup your LLM with streaming enabled llm = ChatOpenAI(temperature=0.1, streaming=True) # Save model using MLflow save_model( llm, path="llm_model_streaming", mlflow_model_flavor="langchain" )
  • Ensure the ChatModel (ChatOpenAI, MosaicML, etc.) supports streaming out of the box and is saved with that capability.

References to the Official Docs

The official [Databricks agent streaming guide], and MLflow ChatModel/Streaming documentation: confirm the streaming interface is present and properly implemented when you save and subsequently deploy the model.


Key Steps to Fix

  • Verify that your saving function in MLflow (e.g. save_model()) saves a streaming-capable ChatModel.

  • Re-register the model in MLflow after confirming that the underlying implementation is compatible with streaming.

  • Update deployment code or configs to use the streaming endpoint (predict_stream).

If the underlying LLM class or deployment does not support streaming, you must swap to a compatible class and redeploy.


Table: Error Cause and Resolution

Cause Resolution
Model lacks predict_stream method Save with streaming ChatModel
Wrong MLflow save function or model class Use mlflow.langchain.save_model
LLM streaming not enabled in config Set stream=True in LLM params
 
 

Implement these corrections, re-save and deploy your MLflow model, and the streaming output should work for your agent in Databricks.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now