I am deploying an agent that works good withouth streaming:
it is using the following packages:
"mlflow==2.22.1",
"langgraph",
"langchain",
"pydantic==2.8.2",
"langgraph-checkpoint-sqlite",
"databricks-langchain",
"pypdf",
"databricks-vectorsearch",
"langchain_core",
"databricks-feature-store>=0.13.0",
"nest_asyncio",
"databricks-sdk==0.50.0",
"databricks-agents==0.20.0"
[7gwmf] [2025-06-17 17:21:50 +0000] Encountered an unexpected error while parsing the input data. Error 'This model does not support predict_stream method.'
[7gwmf] Traceback (most recent call last):
[7gwmf] File "/opt/conda/envs/mlflow-env/lib/python3.11/site-packages/mlflowserving/scoring_server/__init__.py", line 670, in transformation
[7gwmf] raise MlflowException("This model does not support predict_stream method.")
[7gwmf] mlflow.exceptions.MlflowException: This model does not support predict_stream method.
[7gwmf]
the mlflow page https://mlflow.org/releases/2.19.0
it says:
ChatModel enhancements - ChatModel now adopts ChatCompletionRequest and ChatCompletionResponse as its new schema. The predict_stream interface uses ChatCompletionChunk to deliver true streaming responses. Additionally, the custom_inputs and custom_outputs fields in ChatModel now utilize AnyType, enabling support for a wider variety of data types. Note: In a future version of MLflow, ChatParams (and by extension, ChatCompletionRequest) will have the default values for n, temperature, and stream removed. (#13782, #13857, @stevenchen-db)
What do i need to do to correctly have an implement the streaming for the llm i am working on.