Errors using Dolly Deployed as a REST API
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2023 03:54 PM
We have deployed Dolly (https://huggingface.co/databricks/dolly-v2-3b) as a REST API endpoint on our infrastructure. The notebook we used to do this is included in the text below my question.
The Databricks infra used had the following config - (13.2 ML, GPU, Spark 3.4.0, g5.2xlarge) .
Dolly executes perfectly in-notebook, without any issues. We created two chains in Langchain to test execution. The first was a Vanilla chain that can be used to answer questions directly, with no context provided. The second is a contextual Q&A chain. Both worked perfectly.
The creation of the model, registration and deployment itself proceeds smoothly, without any issues although it does take a long time (~20+ minutes sometimes...)
Problems arise when we try to access the model indirectly through either the REST interface or by loading a logged and registered model using its URI. I've uploaded the error we see in the attached image.
From here, we tried a variety of things to try and debug the error and see if we could fix it by ourselves, but to no avail. We have tried changing the input format, passing lists instead of strings, Dataframes instead of strings, changing runtime versions, changing the way we log the model (using mlflow.pyfunc.log_model instead of mlflow.langchain.log_model), and experimenting with a variety of JSON formats consistent with the MLflow documentation for JSON inputs to model-served REST APIs.
In all cases, we get this error. From our debugging attempts, it appears that the prompt that is being formed is somehow returning None, but explicitly including prompts as an argument once a model has been logged as a Langchain model is not allowed (in other words the input schema is pre-decided when the model is compiled and inputs need certain keywords that are part of the prompt).
We've spent a lot of time and GPU cycles trying to get this rather straightforward use case to work. Does anyone in this community have any insight into what we might be doing wrong here?
Any help would be greatly appreciated!
Thanks in Advance!
-----------------begin notebook code-----------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2024 07:13 AM
I had a similar problem when I used HuggingFacePipeline(pipeline=generate_text) with langchain. It worked to me when I tried to use HuggingFaceHub instead. I used the same dolly-3b model.

