โ02-06-2024 11:06 AM
Iยดm trying to model serving a LLM LangChain Model and every time it fails with this messsage:
[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146
[6b6448zjll] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.
Iยดm trying to enable using
โ02-07-2024 02:43 PM
Hi @Retired_mod , Thanks your response. I did a couple of your recommendations and no look so far. What I did so far:
Check Model Configuration:
R: I did the configuration similar ,not equal, comparing with 02-Deploy-RAG-Chatbot-Model (LLM with rag on databricks - dbdemos). lets show to you what I did on this subject:
w = WorkspaceClient()
endpoint_config = EndpointCoreConfigInput(
name=serving_endpoint_name,
served_models=[
ServedModelInput(
model_name=model_name,
model_version=latest_model_version,
workload_size="Small",
workload_type="GPU_SMALL",
scale_to_zero_enabled=False,
environment_vars={
"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # <scope>/<secret> that contains an access token
}
)
]
)
Also Iยดm using FAISS as vector search with FAIS GPU package.
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda:0"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)
and with save_local and load
#Persist to be ready for mlflow
persist_directory = "langchain/faiss_index"
vectorstore.save_local(persist_directory)
def get_retriever(persist_dir: str = None):
if (persist_dir==None):
db = FAISS.load_local("langchain/faiss_index", embeddings)
else:
db = FAISS.load_local(persist_dir, embeddings)
return db.as_retriever()
Model Name Mapping:
R: I donยดt even know how to implement this. Seams to be a static function, but where I put the code bellow. Do you have a tip ?
@staticmethod def modelname_to_contextsize(modelname: str) -> int: model_token_mapping = { # ... existing model mappings ... "gpt-35-turbo-16k": <max_context_size_for_this_model>, # Add your model here } # rest of the method..
Output Format Alignment:
R: I think that is Ok, I put on this way and when I test in the Databricks notebook it works fine.
def transform_output(response):
return str(response)
llm3 = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120},transform_output_fn=transform_output) #SAME RESULT
#llm = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120})
#input_text = "What is apache spark?"
input_text = "Qual o tipo do campo WarehouseBalance ?"
print(llm3.predict(input_text))
Prompt Assignment:
R: I think that is OK, I put on this way:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])
chain = RetrievalQA.from_chain_type(
llm=llm3,
chain_type="stuff",
retriever=get_retriever(),
chain_type_kwargs={"prompt": prompt}
)
Some actions Iยดm planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU)
2) Implement the Model Name mapping. However I donยดt know where I put the code.
Any thoughts ?
Thanks
โ02-08-2024 12:19 PM
Hi @Retired_mod
About the actions I have taken :
Some actions Iยดm planning to do:
1) Implement another vector search such as chroma (I think it will not use GPU) - I didnยดt work. I changed to CPU with chroma as vector search and It shown the same issue:
[86b54lclcl] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.
2) Implement the Model Name mapping. However I donยดt know where I put the code.
Do you have any information in how to implement this ?
โ02-09-2024 06:46 AM
Hi All
I tested another way puting a conda_env parameter instead of pip_requirements and no look so far
โ02-16-2024 12:37 PM
I tried to run in another cell something like this
โ03-05-2024 03:02 AM
hi @marcelo2108 do you have some progress on this one. We encounter the same issue while deploying a RAG chatbot in databricks
โ03-05-2024 02:29 AM
Hi Guys, we encountered the same issue, do we have resolution to this
โ03-05-2024 08:22 AM
Same issue
An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`
โ03-05-2024 08:24 AM
Hi @SwaggerP and @DataWrangler . Yes Iยดm with the same issue still and without a solution so far.
โ03-07-2024 03:02 AM - edited โ03-07-2024 03:02 AM
I was having the same issue deploying a custom pyfunc model, eventually found a fix by deploying one function at a time to isolate where the issues was. Mine was caused by the vector search component - I was using self-managed embeddings, and it was the initialising of the embedding client and the vector search client `VectorSearchClient()` in the load_context() that was causing this issue. Moving the initialisation of all clients to within the functions they were used in solved for me, good luck with your models.
โ03-07-2024 05:13 AM
thanks for the hint @bengidlow , however this did not work for me. I'm just using the dbdemo so i'm confused why it doesnt just work
โ03-07-2024 06:59 AM
Same with me. Just using dbdemo. It doesnt work
โ03-08-2024 04:46 AM
@Retired_mod any help would be greatly appreciated... seems like dbdemos should just work
โ03-08-2024 05:19 AM
issue seems to be in the get_retriever() function at
โ03-08-2024 11:57 PM
I tried enhancing the said function. Even declaring imports inside it. Still error
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group