Databricks Community

marcelo2108 · ‎02-06-2024

I´m trying to model serving a LLM LangChain Model and every time it fails with this messsage:

[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146
[6b6448zjll] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

I´m trying to enable using

"scale_to_zero_enabled": "False",

"workload_type": "GPU_SMALL",

"workload_size": "Small",
I tried using code, using UI and it shows this error every time.
I´m logging the model with success as follows

import mlflow

import langchain

from mlflow.models import infer_signature

with mlflow.start_run() as run:

signature = infer_signature(question, answer)

logged_model = mlflow.langchain.log_model(

lc_model=llm_chain,

artifact_path="model",

registered_model_name="llamav2-llm-chain",

metadata={"task": "llm/v1/completions"},

pip_requirements=["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__],

signature=signature,

await_registration_for=900 # wait for 15 minutes for model registration to complete

)

# Load the retrievalQA chain

loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)

marcelo2108 · ‎02-07-2024

Hi @Retired_mod , Thanks your response. I did a couple of your recommendations and no look so far. What I did so far:

Check Model Configuration:

Ensure that you’ve correctly configured the model. Double-check the settings related to scale_to_zero_enabled, workload_type, and workload_size. Make sure they match your intended setup.

R: I did the configuration similar ,not equal, comparing with 02-Deploy-RAG-Chatbot-Model (LLM with rag on databricks - dbdemos). lets show to you what I did on this subject:

w = WorkspaceClient()
endpoint_config = EndpointCoreConfigInput(
name=serving_endpoint_name,
served_models=[
ServedModelInput(
model_name=model_name,
model_version=latest_model_version,
workload_size="Small",
workload_type="GPU_SMALL",
scale_to_zero_enabled=False,
environment_vars={
"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token
}
)
]
)

Also I´m using FAISS as vector search with FAIS GPU package.

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda:0"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)

and with save_local and load

#Persist to be ready for mlflow
persist_directory = "langchain/faiss_index"
vectorstore.save_local(persist_directory)

def get_retriever(persist_dir: str = None):
if (persist_dir==None):
db = FAISS.load_local("langchain/faiss_index", embeddings)
else:
db = FAISS.load_local(persist_dir, embeddings)
return db.as_retriever()

Model Name Mapping:

Sometimes, errors like this occur because the model name isn’t included in the model_token_mapping dictionary. To resolve this, add your model (e.g., “gpt-35-turbo-16k”) to the dictionary along with its correspo....

R: I don´t even know how to implement this. Seams to be a static function, but where I put the code bellow. Do you have a tip ?

@staticmethod
def modelname_to_contextsize(modelname: str) -> int:
    model_token_mapping = {
        # ... existing model mappings ...
        "gpt-35-turbo-16k": <max_context_size_for_this_model>,  # Add your model here
    }

    # rest of the method..

Output Format Alignment:

Verify that the output format of your LLM aligns with what your agent expects. If necessary, adjust the parsing logic to handle the specific output format of your custom LLM.

R: I think that is Ok, I put on this way and when I test in the Databricks notebook it works fine.

def transform_output(response):
return str(response)

llm3 = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120},transform_output_fn=transform_output) #SAME RESULT

#llm = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120})

#input_text = "What is apache spark?"

input_text = "Qual o tipo do campo WarehouseBalance ?"

print(llm3.predict(input_text))

Prompt Assignment:

When iterating over LLM models, try assigning the prompt inline instead of using a variable. For example:chain = LLMChain(llm=llm_model, prompt=PromptTemplate(template=template, input_variables=['context', 'prompt']))

R: I think that is OK, I put on this way:

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
llm=llm3,
chain_type="stuff",
retriever=get_retriever(),
chain_type_kwargs={"prompt": prompt}
)
Some actions I´m planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU)
2) Implement the Model Name mapping. However I don´t know where I put the code.

Any thoughts ?

Thanks

marcelo2108 · ‎02-08-2024

Hi @Retired_mod

About the actions I have taken :

Some actions I´m planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU) - I didn´t work. I changed to CPU with chroma as vector search and It shown the same issue:
[86b54lclcl] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

2) Implement the Model Name mapping. However I don´t know where I put the code.
Do you have any information in how to implement this ?

marcelo2108 · ‎02-09-2024

Hi All

I tested another way puting a conda_env parameter instead of pip_requirements and no look so far

conda_env={

"name": "mlflow-env",

"channels": ["defaults"], #it was conda-forge

"dependencies": [

"python=3.10.12",

"gunicorn=20.1.0",

{

"pip": ["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__,"sentence_transformers","chromadb"],

},

],

}

Is there anyone passed to this problem when serve a LLM Model with langchain and llama ? llama was preivously enabled as a custom model with success in databricks. However when I use langchain with a model loaded
as follows:

from langchain.chains import RetrievalQA

from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.

Use the following pieces of context to answer the question at the end:

{context}

Question: {question}

Answer:

"""

prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(

llm=llm3,

chain_type="stuff",

retriever=get_retriever(),

chain_type_kwargs={"prompt": prompt}

)

and deploy as a serving model with :

w = WorkspaceClient()

endpoint_config = EndpointCoreConfigInput(

name=serving_endpoint_name,

served_models=[

ServedModelInput(

model_name=model_name,

model_version=latest_model_version,

workload_size="Small",

workload_type="GPU_LARGE",

scale_to_zero_enabled=False,

environment_vars={

"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token

}

)

]

)
It shows this message when in databricks with serving model failed:

[58c45b9xxw] [2024-02-09 14:20:06 +0000] [495] [INFO] Worker exiting (pid: 495)
[58c45b9xxw] [2024-02-09 14:20:06 +0000] [589] [INFO] Booting worker with pid: 589
[58c45b9xxw] /opt/conda/envs/mlflow-env/lib/python3.10/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
[58c45b9xxw] * 'schema_extra' has been renamed to 'json_schema_extra'
[58c45b9xxw] warnings.warn(message, UserWarning)
[58c45b9xxw] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

marcelo2108 · ‎02-16-2024

I tried to run in another cell something like this

!/opt/conda/envs/mlflow-env/bin/gunicorn configure

and shown the error:
/bin/bash: line 1: /opt/conda/envs/mlflow-env/bin/gunicorn: No such file or directory

SwaggerP · ‎03-05-2024

hi @marcelo2108 do you have some progress on this one. We encounter the same issue while deploying a RAG chatbot in databricks

SwaggerP · ‎03-05-2024

Hi Guys, we encountered the same issue, do we have resolution to this

DataWrangler · ‎03-05-2024

Same issue

An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`

marcelo2108 · ‎03-05-2024

Hi @SwaggerP and @DataWrangler . Yes I´m with the same issue still and without a solution so far.

bengidlow · ‎03-07-2024

I was having the same issue deploying a custom pyfunc model, eventually found a fix by deploying one function at a time to isolate where the issues was. Mine was caused by the vector search component - I was using self-managed embeddings, and it was the initialising of the embedding client and the vector search client `VectorSearchClient()` in the load_context() that was causing this issue. Moving the initialisation of all clients to within the functions they were used in solved for me, good luck with your models.