topic Re: Problem when serving a langchain model on Databricks in Machine Learning

Problem when serving a langchain model on Databricks

marcelo2108 — Tue, 06 Feb 2024 19:06:16 GMT

I´m trying to model serving a LLM LangChain Model and every time it fails with this messsage:

[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146
[6b6448zjll] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

I´m trying to enable using

"scale_to_zero_enabled": "False",

"workload_type": "GPU_SMALL",

"workload_size": "Small",
I tried using code, using UI and it shows this error every time.
I´m logging the model with success as follows

import mlflow

import langchain

from mlflow.models import infer_signature

with mlflow.start_run() as run:

signature = infer_signature(question, answer)

logged_model = mlflow.langchain.log_model(

lc_model=llm_chain,

artifact_path="model",

registered_model_name="llamav2-llm-chain",

metadata={"task": "llm/v1/completions"},

pip_requirements=["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__],

signature=signature,

await_registration_for=900 # wait for 15 minutes for model registration to complete

)

# Load the retrievalQA chain

loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Wed, 07 Feb 2024 22:43:58 GMT

Hi @Retired_mod , Thanks your response. I did a couple of your recommendations and no look so far. What I did so far:

Check Model Configuration:

Ensure that you’ve correctly configured the model. Double-check the settings related to scale_to_zero_enabled, workload_type, and workload_size. Make sure they match your intended setup.

R: I did the configuration similar ,not equal, comparing with 02-Deploy-RAG-Chatbot-Model (LLM with rag on databricks - dbdemos). lets show to you what I did on this subject:

w = WorkspaceClient()
endpoint_config = EndpointCoreConfigInput(
name=serving_endpoint_name,
served_models=[
ServedModelInput(
model_name=model_name,
model_version=latest_model_version,
workload_size="Small",
workload_type="GPU_SMALL",
scale_to_zero_enabled=False,
environment_vars={
"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token
}
)
]
)

Also I´m using FAISS as vector search with FAIS GPU package.

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda:0"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)

and with save_local and load

#Persist to be ready for mlflow
persist_directory = "langchain/faiss_index"
vectorstore.save_local(persist_directory)

def get_retriever(persist_dir: str = None):
if (persist_dir==None):
db = FAISS.load_local("langchain/faiss_index", embeddings)
else:
db = FAISS.load_local(persist_dir, embeddings)
return db.as_retriever()

Model Name Mapping:

Sometimes, errors like this occur because the model name isn’t included in the model_token_mapping dictionary. To resolve this, add your model (e.g., “gpt-35-turbo-16k”) to the dictionary along with its correspo....

R: I don´t even know how to implement this. Seams to be a static function, but where I put the code bellow. Do you have a tip ?

@staticmethod
def modelname_to_contextsize(modelname: str) -> int:
    model_token_mapping = {
        # ... existing model mappings ...
        "gpt-35-turbo-16k": <max_context_size_for_this_model>,  # Add your model here
    }

    # rest of the method..

Output Format Alignment:

Verify that the output format of your LLM aligns with what your agent expects. If necessary, adjust the parsing logic to handle the specific output format of your custom LLM.

R: I think that is Ok, I put on this way and when I test in the Databricks notebook it works fine.

def transform_output(response):
return str(response)

llm3 = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120},transform_output_fn=transform_output) #SAME RESULT

#llm = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120})

#input_text = "What is apache spark?"

input_text = "Qual o tipo do campo WarehouseBalance ?"

print(llm3.predict(input_text))

Prompt Assignment:

When iterating over LLM models, try assigning the prompt inline instead of using a variable. For example:chain = LLMChain(llm=llm_model, prompt=PromptTemplate(template=template, input_variables=['context', 'prompt']))

R: I think that is OK, I put on this way:

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
llm=llm3,
chain_type="stuff",
retriever=get_retriever(),
chain_type_kwargs={"prompt": prompt}
)
Some actions I´m planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU)
2) Implement the Model Name mapping. However I don´t know where I put the code.

Any thoughts ?

Thanks

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Thu, 08 Feb 2024 20:19:59 GMT

Hi @Retired_mod

About the actions I have taken :

Some actions I´m planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU) - I didn´t work. I changed to CPU with chroma as vector search and It shown the same issue:
[86b54lclcl] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

2) Implement the Model Name mapping. However I don´t know where I put the code.
Do you have any information in how to implement this ?

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Fri, 09 Feb 2024 14:46:23 GMT

Hi All

I tested another way puting a conda_env parameter instead of pip_requirements and no look so far

conda_env={

"name": "mlflow-env",

"channels": ["defaults"], #it was conda-forge

"dependencies": [

"python=3.10.12",

"gunicorn=20.1.0",

{

"pip": ["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__,"sentence_transformers","chromadb"],

}

Is there anyone passed to this problem when serve a LLM Model with langchain and llama ? llama was preivously enabled as a custom model with success in databricks. However when I use langchain with a model loaded
as follows:

from langchain.chains import RetrievalQA

from langchain.prompts import PromptTemplate

Use the following pieces of context to answer the question at the end:

{context}

Question: {question}

Answer:

"""

prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(

llm=llm3,

chain_type="stuff",

retriever=get_retriever(),

chain_type_kwargs={"prompt": prompt}

)

and deploy as a serving model with :

w = WorkspaceClient()

endpoint_config = EndpointCoreConfigInput(

name=serving_endpoint_name,

served_models=[

ServedModelInput(

model_name=model_name,

model_version=latest_model_version,

workload_size="Small",

workload_type="GPU_LARGE",

scale_to_zero_enabled=False,

environment_vars={

"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token

}

)

]

)
It shows this message when in databricks with serving model failed:

[58c45b9xxw] [2024-02-09 14:20:06 +0000] [495] [INFO] Worker exiting (pid: 495)
[58c45b9xxw] [2024-02-09 14:20:06 +0000] [589] [INFO] Booting worker with pid: 589
[58c45b9xxw] /opt/conda/envs/mlflow-env/lib/python3.10/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
[58c45b9xxw] * 'schema_extra' has been renamed to 'json_schema_extra'
[58c45b9xxw] warnings.warn(message, UserWarning)
[58c45b9xxw] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Fri, 16 Feb 2024 20:37:08 GMT

I tried to run in another cell something like this

!/opt/conda/envs/mlflow-env/bin/gunicorn configure

and shown the error:
/bin/bash: line 1: /opt/conda/envs/mlflow-env/bin/gunicorn: No such file or directory

Re: Problem when serving a langchain model on Databricks

SwaggerP — Tue, 05 Mar 2024 10:29:18 GMT

Hi Guys, we encountered the same issue, do we have resolution to this

Re: Problem when serving a langchain model on Databricks

SwaggerP — Tue, 05 Mar 2024 11:02:25 GMT

hi @marcelo2108 do you have some progress on this one. We encounter the same issue while deploying a RAG chatbot in databricks

Re: Problem when serving a langchain model on Databricks

DataWrangler — Tue, 05 Mar 2024 16:22:12 GMT

Same issue

An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Tue, 05 Mar 2024 16:24:48 GMT

Hi @SwaggerP and @DataWrangler . Yes I´m with the same issue still and without a solution so far.

Re: Problem when serving a langchain model on Databricks

bengidlow — Thu, 07 Mar 2024 11:02:50 GMT

I was having the same issue deploying a custom pyfunc model, eventually found a fix by deploying one function at a time to isolate where the issues was. Mine was caused by the vector search component - I was using self-managed embeddings, and it was the initialising of the embedding client and the vector search client `VectorSearchClient()` in the load_context() that was causing this issue. Moving the initialisation of all clients to within the functions they were used in solved for me, good luck with your models.

Re: Problem when serving a langchain model on Databricks

DataWrangler — Thu, 07 Mar 2024 13:13:02 GMT

thanks for the hint @bengidlow , however this did not work for me. I'm just using the dbdemo so i'm confused why it doesnt just work

Re: Problem when serving a langchain model on Databricks

SwaggerP — Thu, 07 Mar 2024 14:59:54 GMT

Same with me. Just using dbdemo. It doesnt work

Re: Problem when serving a langchain model on Databricks

DataWrangler — Fri, 08 Mar 2024 12:46:20 GMT

@Retired_mod any help would be greatly appreciated... seems like dbdemos should just work

Re: Problem when serving a langchain model on Databricks

DataWrangler — Fri, 08 Mar 2024 13:19:29 GMT

issue seems to be in the get_retriever() function at

vectorstore = DatabricksVectorSearch(

vs_index, text_column="content", embedding=embedding_model, columns=["url"]

)

Re: Problem when serving a langchain model on Databricks

SwaggerP — Sat, 09 Mar 2024 07:57:58 GMT

I tried enhancing the said function. Even declaring imports inside it. Still error

Re: Problem when serving a langchain model on Databricks

DataWrangler — Sat, 09 Mar 2024 17:09:56 GMT

All, I've fixed the error. Though, to be honest, I'm not exactly sure what ended up doing it. I was trying to do it systematically, but I lost track. None the less, I hope my below code helps.

@SwaggerP @marcelo2108

def get_retriever(persist_dir: str = None): import gunicorn from databricks.vector_search.client import VectorSearchClient from langchain_community.vectorstores import DatabricksVectorSearch from langchain_community.embeddings import DatabricksEmbeddings from langchain_community.chat_models import ChatDatabricks from langchain.chains import RetrievalQA import logging import traceback logging.basicConfig(filename='error.log', level=logging.DEBUG) print('libraries loaded') # token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get() embedding_model = DatabricksEmbeddings(endpoint="databricks-bge-large-en") print('initialized embedding_model') #Get the vector search index vsc = VectorSearchClient(workspace_url=os.environ["DATABRICKS_HOST"], personal_access_token=os.environ["DATABRICKS_TOKEN"], disable_notice=True ) print('initialized VectorSearchClient') vs_index = vsc.get_index( endpoint_name='vectorsearch', index_name=vsIndexName ) print('initialized vs_index') # Create the retriever try: print('trying to initialize vectorstore') vectorstore = DatabricksVectorSearch( vs_index, text_column="content", embedding=embedding_model, columns=["url"] ) retriever = vectorstore.as_retriever(search_kwargs={'k': 4}) print('initialized vectorstore') return retriever except BaseException as e: print("An error occurred:", str(e)) traceback.print_exc() from langchain.vectorstores import DatabricksVectorSearch import os from langchain_community.chat_models import ChatDatabricks from langchain.chains import RetrievalQA from langchain import hub prompt = hub.pull("rlm/rag-prompt", api_url="https://api.hub.langchain.com") retriever = get_retriever() chat_model = ChatDatabricks(endpoint="databricks-llama-2-70b-chat") qa_chain = RetrievalQA.from_chain_type( chat_model, retriever=retriever, chain_type_kwargs={"prompt": prompt} ) import langchain from mlflow.models import infer_signature with mlflow.start_run(run_name=runName) as run: question = "qiestopm jere?" result = qa_chain({"query": question}) signature = infer_signature(result['query'], result['result']) model_info = mlflow.langchain.log_model( qa_chain, loader_fn=get_retriever, # Load the retriever with DATABRICKS_TOKEN env as secret (for authentication). artifact_path="chain", registered_model_name=fq_model_name, pip_requirements=[ "mlflow", "langchain", "langchain_community", "databricks-vectorsearch", "pydantic==2.5.2 --no-binary pydantic", "cloudpickle", "langchainhub" ], input_example=result, signature=signature, ) import urllib import json import mlflow import requests import time from mlflow.tracking import MlflowClient client = MlflowClient() model_name = f"{fq_model_name}" serving_endpoint_name = servingName #TODO: use the sdk once model serving is available. serving_client = EndpointApiClient() auto_capture_config = { "catalog_name": catalog, "schema_name": db, "table_name_prefix": serving_endpoint_name } environment_vars={ "DATABRICKS_HOST" : "{{secrets/azurekeyvault/hostsecrethere}}", "DATABRICKS_TOKEN" : "{{secrets/azurekeyvault/pathere}}" } serving_client.create_endpoint_if_not_exists(serving_endpoint_name, model_name=model_name.lower(), model_version = 33, workload_size="Small", scale_to_zero_enabled=True, wait_start = True, auto_capture_config=auto_capture_config, environment_vars=environment_vars )

Re: Problem when serving a langchain model on Databricks

SwaggerP — Thu, 14 Mar 2024 11:48:45 GMT

Thank you @DataWrangler
Mine is now successfully deployed, I am now facing this 'Forbidden for url' issue whenever I query the endpoint.
In our workspace, PAT are not allowed hence we need to use a service principal.

Probable cause is the service principal?

03 Client Error: Forbidden for url: /serving-endpoints/databricks-mixtral-8x7b-instruct/invocations

Re: Problem when serving a langchain model on Databricks

ADS1 — Thu, 14 Mar 2024 20:23:33 GMT

@SwaggerP @DataWrangler Any solution?

Re: Problem when serving a langchain model on Databricks

marcelo2108 — Fri, 15 Mar 2024 01:12:28 GMT

Hi @DataWrangler Thanks your valuable inputs. I have a question about your code

 embedding_model = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

You need UC enabled right ? In case that I don´t have UC enabled. Could I use HuggingFace Embeddings instead with DatabricksVectorSearch ?

Re: Problem when serving a langchain model on Databricks

SwaggerP — Tue, 26 Mar 2024 15:24:25 GMT

bge is part of foundation models, no need for unity catalog for this. Mine is also deployed successfully.