cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Problem when serving a langchain model on Databricks

marcelo2108
Contributor

Iยดm trying to model serving a LLM LangChain Model and every time it fails with this messsage:

[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146
[6b6448zjll] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

Iยดm trying to enable using 

"scale_to_zero_enabled": "False",            
"workload_type": "GPU_SMALL",
"workload_size": "Small",
I tried using code, using UI and it shows this error every time. 
Iยดm logging the model with success as follows


import mlflow
import langchain
from mlflow.models import infer_signature

with mlflow.start_run() as run:
    signature = infer_signature(question, answer)
    logged_model = mlflow.langchain.log_model(
        lc_model=llm_chain,
        artifact_path="model",
        registered_model_name="llamav2-llm-chain",
        metadata={"task": "llm/v1/completions"},
        pip_requirements=["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__],
        signature=signature,
        await_registration_for=900 # wait for 15 minutes for model registration to complete
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)


25 REPLIES 25

marcelo2108
Contributor

 

Hi @Retired_mod , Thanks your response. I did a couple of your recommendations and no look so far. What I did so far:

Check Model Configuration:

  • Ensure that youโ€™ve correctly configured the model. Double-check the settings related to scale_to_zero_enabled, workload_type, and workload_size. Make sure they match your intended setup.

R: I did the configuration similar ,not equal, comparing with 02-Deploy-RAG-Chatbot-Model (LLM with rag on databricks - dbdemos). lets show to you what I did on this subject:

w = WorkspaceClient()
endpoint_config = EndpointCoreConfigInput(
name=serving_endpoint_name,
served_models=[
ServedModelInput(
model_name=model_name,
model_version=latest_model_version,
workload_size="Small",
workload_type="GPU_SMALL",
scale_to_zero_enabled=False,
environment_vars={
"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token
}
)
]
)

Also Iยดm using FAISS as vector search with FAIS GPU package.

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda:0"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)

and with save_local and load

#Persist to be ready for mlflow
persist_directory = "langchain/faiss_index"
vectorstore.save_local(persist_directory)

def get_retriever(persist_dir: str = None):
if (persist_dir==None):
db = FAISS.load_local("langchain/faiss_index", embeddings)
else:
db = FAISS.load_local(persist_dir, embeddings)
return db.as_retriever()

Model Name Mapping:

  • Sometimes, errors like this occur because the model name isnโ€™t included in the model_token_mapping dictionary. To resolve this, add your model (e.g., โ€œgpt-35-turbo-16kโ€) to the dictionary along with its correspo....

R: I donยดt even know how to implement this. Seams to be a static function, but where I put the code bellow. Do you have a tip ?

@staticmethod
def modelname_to_contextsize(modelname: str) -> int:
    model_token_mapping = {
        # ... existing model mappings ...
        "gpt-35-turbo-16k": <max_context_size_for_this_model>,  # Add your model here
    }

    # rest of the method..

Output Format Alignment:

  • Verify that the output format of your LLM aligns with what your agent expects. If necessary, adjust the parsing logic to handle the specific output format of your custom LLM.

R: I think that is Ok, I put on this way and when I test in the Databricks notebook it works fine.

def transform_output(response):
return str(response)

llm3 = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120},transform_output_fn=transform_output) #SAME RESULT

#llm = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120})

#input_text = "What is apache spark?"

input_text = "Qual o tipo do campo WarehouseBalance ?"

print(llm3.predict(input_text))


Prompt Assignment:

  • When iterating over LLM models, try assigning the prompt inline instead of using a variable. For example:chain = LLMChain(llm=llm_model, prompt=PromptTemplate(template=template, input_variables=['context', 'prompt']))

R: I think that is OK, I put on this way:

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
llm=llm3,
chain_type="stuff",
retriever=get_retriever(),
chain_type_kwargs={"prompt": prompt}
)
Some actions Iยดm planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU)
2) Implement the Model Name mapping. However I donยดt know where I put the code.

Any thoughts ?

Thanks











marcelo2108
Contributor

Hi @Retired_mod 

About the actions I have taken :

Some actions Iยดm planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU) - I didnยดt work. I changed to CPU with chroma as vector search and It shown the same issue:
[86b54lclcl] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.

2) Implement the Model Name mapping. However I donยดt know where I put the code.
Do you have any information in how to implement this ? 

marcelo2108
Contributor

Hi All

I tested another way puting a conda_env parameter instead of pip_requirements and no look so far

conda_env={
    "name": "mlflow-env",
    "channels": ["defaults"], #it was conda-forge
    "dependencies": [
        "python=3.10.12",
        "gunicorn=20.1.0",
        {
            "pip": ["mlflow==" + mlflow.__version__,"langchain==" + langchain.__version__,"sentence_transformers","chromadb"],
        },
    ],
}

Is there anyone passed to this problem when serve a LLM Model with langchain and llama ? llama was preivously enabled as a custom model with success in databricks. However when I use langchain with a model loaded 
as follows:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
 
TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
    llm=llm3,
    chain_type="stuff",
    retriever=get_retriever(),
    chain_type_kwargs={"prompt": prompt}
)

and deploy as a serving model with :
w = WorkspaceClient()
endpoint_config = EndpointCoreConfigInput(
    name=serving_endpoint_name,
    served_models=[
        ServedModelInput(
            model_name=model_name,
            model_version=latest_model_version,
            workload_size="Small",
            workload_type="GPU_LARGE",
            scale_to_zero_enabled=False,
            environment_vars={
                "DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}",  # <scope>/<secret> that contains an access token
            }
        )
    ]
)
It shows this message when in databricks with serving model failed:

[58c45b9xxw] [2024-02-09 14:20:06 +0000] [495] [INFO] Worker exiting (pid: 495)
[58c45b9xxw] [2024-02-09 14:20:06 +0000] [589] [INFO] Booting worker with pid: 589
[58c45b9xxw] /opt/conda/envs/mlflow-env/lib/python3.10/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
[58c45b9xxw] * 'schema_extra' has been renamed to 'json_schema_extra'
[58c45b9xxw] warnings.warn(message, UserWarning)
[58c45b9xxw] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.



marcelo2108
Contributor

I tried to run in another cell something like this

!/opt/conda/envs/mlflow-env/bin/gunicorn configure
and shown the error:
/bin/bash: line 1: /opt/conda/envs/mlflow-env/bin/gunicorn: No such file or directory

 

SwaggerP
New Contributor III

hi @marcelo2108 do you have some progress on this one. We encounter the same issue while deploying a RAG chatbot in databricks

SwaggerP
New Contributor III

Hi Guys, we encountered the same issue, do we have resolution to this

DataWrangler
New Contributor III

Same issue

An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`

marcelo2108
Contributor

Hi @SwaggerP and @DataWrangler . Yes Iยดm with the same issue still and without a solution so far.

bengidlow
New Contributor II

I was having the same issue deploying a custom pyfunc model, eventually found a fix by deploying one function at a time to isolate where the issues was. Mine was caused by the vector search component - I was using self-managed embeddings, and it was the initialising of the embedding client and the vector search client `VectorSearchClient()` in the load_context() that was causing this issue. Moving the initialisation of all clients to within the functions they were used in solved for me, good luck with your models.

DataWrangler
New Contributor III

thanks for the hint @bengidlow , however this did not work for me. I'm just using the dbdemo so i'm confused why it doesnt just work

SwaggerP
New Contributor III

Same with me. Just using dbdemo. It doesnt work

DataWrangler
New Contributor III

@Retired_mod any help would be greatly appreciated... seems like dbdemos should just work

DataWrangler
New Contributor III

issue seems to be in the get_retriever() function at 

    vectorstore = DatabricksVectorSearch(
        vs_index, text_column="content", embedding=embedding_model, columns=["url"]
    )

SwaggerP
New Contributor III

I tried enhancing the said function. Even declaring imports inside it. Still error

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group