<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Problem when serving a langchain model on Databricks in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63751#M3117</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/41942"&gt;@DataWrangler&lt;/a&gt;&amp;nbsp;Thanks your valuable inputs. I have a question about your code&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt; embedding_model = DatabricksEmbeddings(endpoint="databricks-bge-large-en")&lt;/PRE&gt;&lt;P&gt;You need UC enabled right ? In case that I don´t have UC enabled. Could I use HuggingFace Embeddings instead with DatabricksVectorSearch ?&lt;/P&gt;</description>
    <pubDate>Fri, 15 Mar 2024 01:12:28 GMT</pubDate>
    <dc:creator>marcelo2108</dc:creator>
    <dc:date>2024-03-15T01:12:28Z</dc:date>
    <item>
      <title>Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59506#M2966</link>
      <description>&lt;P&gt;I´m trying to model serving a LLM LangChain Model and every time it fails with this messsage:&lt;BR /&gt;&lt;BR /&gt;[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146&lt;BR /&gt;[6b6448zjll] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.&lt;BR /&gt;&lt;BR /&gt;I´m trying to enable using&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;scale_to_zero_enabled&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;workload_type&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;GPU_SMALL&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;workload_size&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;Small&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;,&lt;BR /&gt;I tried using code, using UI and it shows this error every time.&amp;nbsp;&lt;BR /&gt;I´m logging the model with success as follows&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; mlflow&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; langchain&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.models &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; infer_signature&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;with&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.start_run() &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; run:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; signature &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; infer_signature(question, answer)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; logged_model &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.langchain.log_model(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;lc_model&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;llm_chain,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;artifact_path&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;model&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;registered_model_name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;llamav2-llm-chain&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;metadata&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;task&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;llm/v1/completions&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;},&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;pip_requirements&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;mlflow==&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.__version__,&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;langchain==&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt; langchain.__version__],&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;signature&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;signature,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;await_registration_for&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;900&lt;/SPAN&gt; &lt;SPAN&gt;#&lt;/SPAN&gt;&lt;SPAN&gt; wait for 15 minutes for model registration to complete&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;&lt;SPAN&gt; Load the retrievalQA chain&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;loaded_model &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.pyfunc.load_model(logged_model.model_uri)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 06 Feb 2024 19:06:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59506#M2966</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-02-06T19:06:16Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59624#M2973</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, Thanks your response. I did a couple of your recommendations and no look so far. What I did so far:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Check Model Configuration:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class="lia-list-style-type-disc"&gt;&lt;LI&gt;Ensure that you’ve correctly configured the model. Double-check the settings related to scale_to_zero_enabled, workload_type, and workload_size. Make sure they match your intended setup.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;R:&lt;/STRONG&gt; I did the configuration similar ,not equal, comparing with&amp;nbsp;02-Deploy-RAG-Chatbot-Model (LLM with rag on databricks - dbdemos). lets show to you what I did on this subject:&lt;/P&gt;&lt;P&gt;w = WorkspaceClient()&lt;BR /&gt;endpoint_config = EndpointCoreConfigInput(&lt;BR /&gt;name=serving_endpoint_name,&lt;BR /&gt;served_models=[&lt;BR /&gt;ServedModelInput(&lt;BR /&gt;model_name=model_name,&lt;BR /&gt;model_version=latest_model_version,&lt;BR /&gt;workload_size="Small",&lt;BR /&gt;workload_type="GPU_SMALL",&lt;BR /&gt;scale_to_zero_enabled=False,&lt;BR /&gt;environment_vars={&lt;BR /&gt;"DATABRICKS_TOKEN": "{{secrets/kb-kv-secrets/adb-kb-ml-token}}", # &amp;lt;scope&amp;gt;/&amp;lt;secret&amp;gt; that contains an access token&lt;BR /&gt;}&lt;BR /&gt;)&lt;BR /&gt;]&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;Also I´m using FAISS as vector search with FAIS GPU package.&lt;/P&gt;&lt;P&gt;model_name = "sentence-transformers/all-mpnet-base-v2"&lt;BR /&gt;model_kwargs = {"device": "cuda:0"}&lt;/P&gt;&lt;P&gt;embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)&lt;/P&gt;&lt;P&gt;# storing embeddings in the vector store&lt;BR /&gt;vectorstore = FAISS.from_documents(all_splits, embeddings)&lt;/P&gt;&lt;P&gt;and with save_local and load&lt;/P&gt;&lt;P&gt;#Persist to be ready for mlflow&lt;BR /&gt;persist_directory = "langchain/faiss_index"&lt;BR /&gt;vectorstore.save_local(persist_directory)&lt;/P&gt;&lt;P&gt;def get_retriever(persist_dir: str = None):&lt;BR /&gt;if (persist_dir==None):&lt;BR /&gt;db = FAISS.load_local("langchain/faiss_index", embeddings)&lt;BR /&gt;else:&lt;BR /&gt;db = FAISS.load_local(persist_dir, embeddings)&lt;BR /&gt;return db.as_retriever()&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Model Name Mapping:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class="lia-list-style-type-disc"&gt;&lt;LI&gt;Sometimes, errors like this occur because the model name isn’t included in the model_token_mapping dictionary. To resolve this, add your model (e.g., “gpt-35-turbo-16k”) to the dictionary along with its correspo....&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;R: I don´t even know how to implement this. Seams to be a static function, but where I put the code bellow. Do you have a tip ?&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;@&lt;SPAN class=""&gt;staticmethod&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;modelname_to_contextsize&lt;/SPAN&gt;(&lt;SPAN class=""&gt;modelname&lt;/SPAN&gt;: &lt;SPAN class=""&gt;str&lt;/SPAN&gt;) &lt;SPAN class=""&gt;-&amp;gt;&lt;/SPAN&gt; &lt;SPAN class=""&gt;int&lt;/SPAN&gt;:
    &lt;SPAN class=""&gt;model_token_mapping&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; {
        &lt;SPAN class=""&gt;# ... existing model mappings ...&lt;/SPAN&gt;
        &lt;SPAN class=""&gt;"gpt-35-turbo-16k"&lt;/SPAN&gt;: &lt;SPAN class=""&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;max_context_size_for_this_model&lt;/SPAN&gt;&lt;SPAN class=""&gt;&amp;gt;&lt;/SPAN&gt;,  &lt;SPAN class=""&gt;# Add your model here&lt;/SPAN&gt;
    }

    &lt;SPAN class=""&gt;# rest of the method..&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Output Format Alignment:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class="lia-list-style-type-disc"&gt;&lt;LI&gt;Verify that the output format of your LLM aligns with what your agent expects. If necessary, adjust the parsing logic to handle the specific output format of your custom LLM.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;R:&lt;/STRONG&gt; I think that is Ok, I put on this way and when I test in the Databricks notebook it works fine.&lt;/P&gt;&lt;P&gt;def transform_output(response):&lt;BR /&gt;return str(response)&lt;/P&gt;&lt;P&gt;llm3 = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120},transform_output_fn=transform_output) #SAME RESULT&lt;/P&gt;&lt;P&gt;#llm = Databricks(endpoint_name='mm-v2-llama2-7b-chat-hf',extra_params={"temperature":0.0001,"max_tokens": 120})&lt;/P&gt;&lt;P&gt;#input_text = "What is apache spark?"&lt;/P&gt;&lt;P&gt;input_text = "Qual o tipo do campo WarehouseBalance ?"&lt;/P&gt;&lt;P&gt;print(llm3.predict(input_text))&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Prompt Assignment:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class="lia-list-style-type-disc"&gt;&lt;LI&gt;When iterating over LLM models, try assigning the prompt inline instead of using a variable. For example:chain = LLMChain(llm=llm_model, prompt=PromptTemplate(template=template, input_variables=['context', 'prompt']))&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;R: I think that is OK, I put on this way:&lt;/P&gt;&lt;P&gt;from langchain.chains import RetrievalQA&lt;BR /&gt;from langchain.prompts import PromptTemplate&lt;/P&gt;&lt;P&gt;TEMPLATE = """You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.&lt;BR /&gt;Use the following pieces of context to answer the question at the end:&lt;BR /&gt;{context}&lt;BR /&gt;Question: {question}&lt;BR /&gt;Answer:&lt;BR /&gt;"""&lt;BR /&gt;prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])&lt;/P&gt;&lt;P&gt;chain = RetrievalQA.from_chain_type(&lt;BR /&gt;llm=llm3,&lt;BR /&gt;chain_type="stuff",&lt;BR /&gt;retriever=get_retriever(),&lt;BR /&gt;chain_type_kwargs={"prompt": prompt}&lt;BR /&gt;)&lt;BR /&gt;Some actions I´m planning to do:&lt;BR /&gt;1) Implement another vector search such as chroma (I think it will not use GPU)&lt;BR /&gt;2) Implement the Model Name mapping. However I don´t know where I put the code.&lt;BR /&gt;&lt;BR /&gt;Any thoughts ?&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 22:43:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59624#M2973</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-02-07T22:43:58Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59722#M2978</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;About the actions I have taken :&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Some actions I´m planning to do:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;1) Implement another vector search such as chroma (I think it will not use GPU) - I didn´t work. I changed to CPU with chroma as vector search and It shown the same issue:&lt;BR /&gt;[86b54lclcl] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.&lt;BR /&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2) Implement the Model Name mapping. However I don´t know where I put the code.&lt;BR /&gt;Do you have any information in how to implement this ?&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 20:19:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59722#M2978</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-02-08T20:19:59Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59795#M2985</link>
      <description>&lt;P&gt;Hi All&lt;BR /&gt;&lt;BR /&gt;I tested another way puting a conda_env parameter instead of pip_requirements and no look so far&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;conda_env&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"name"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"mlflow-env"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"channels"&lt;/SPAN&gt;&lt;SPAN&gt;: [&lt;/SPAN&gt;&lt;SPAN&gt;"defaults"&lt;/SPAN&gt;&lt;SPAN&gt;], &lt;/SPAN&gt;&lt;SPAN&gt;#it was conda-forge&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"dependencies"&lt;/SPAN&gt;&lt;SPAN&gt;: [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"python=3.10.12"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"gunicorn=20.1.0"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"pip"&lt;/SPAN&gt;&lt;SPAN&gt;: [&lt;/SPAN&gt;&lt;SPAN&gt;"mlflow=="&lt;/SPAN&gt; &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt; mlflow.&lt;/SPAN&gt;&lt;SPAN&gt;__version__&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;"langchain=="&lt;/SPAN&gt; &lt;SPAN&gt;+&lt;/SPAN&gt;&lt;SPAN&gt; langchain.&lt;/SPAN&gt;&lt;SPAN&gt;__version__&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;"sentence_transformers"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;"chromadb"&lt;/SPAN&gt;&lt;SPAN&gt;],&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; },&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; ],&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;}&lt;BR /&gt;&lt;BR /&gt;Is there anyone passed to this problem when serve a LLM Model with langchain and llama ? llama was preivously enabled as a custom model with success in databricks. However when I use langchain with a model loaded&amp;nbsp;&lt;BR /&gt;as follows:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; langchain.chains &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; RetrievalQA&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; langchain.prompts &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; PromptTemplate&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;TEMPLATE &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;SPAN&gt;You are an assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, DW and platform, API or infrastructure administration question related to Databricks. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Use the following pieces of context to answer the question at the end:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;{context}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Question: &lt;/SPAN&gt;&lt;SPAN&gt;{question}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Answer:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;prompt &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;PromptTemplate&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;template&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;TEMPLATE, &lt;/SPAN&gt;&lt;SPAN&gt;input_variables&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;SPAN&gt;"context"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"question"&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;chain &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; RetrievalQA.&lt;/SPAN&gt;&lt;SPAN&gt;from_chain_type&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;llm&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;llm3,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;chain_type&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"stuff"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;retriever&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;get_retriever&lt;/SPAN&gt;&lt;SPAN&gt;(),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;chain_type_kwargs&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;"prompt"&lt;/SPAN&gt;&lt;SPAN&gt;: prompt}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;and deploy as a serving model with :&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;w &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;WorkspaceClient&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;endpoint_config &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;EndpointCoreConfigInput&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;serving_endpoint_name,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;served_models&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;ServedModelInput&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;model_name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;model_name,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;model_version&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;latest_model_version,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;workload_size&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"Small"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;workload_type&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"GPU_LARGE"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;scale_to_zero_enabled&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;environment_vars&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;"DATABRICKS_TOKEN"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;{{&lt;/SPAN&gt;&lt;SPAN&gt;secrets/kb-kv-secrets/adb-kb-ml-token&lt;/SPAN&gt;&lt;SPAN&gt;}}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# &amp;lt;scope&amp;gt;/&amp;lt;secret&amp;gt; that contains an access token&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; ]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;/SPAN&gt;It shows this message when in databricks with serving model failed:&lt;BR /&gt;&lt;BR /&gt;[58c45b9xxw] [2024-02-09 14:20:06 +0000] [495] [INFO] Worker exiting (pid: 495)&lt;BR /&gt;[58c45b9xxw] [2024-02-09 14:20:06 +0000] [589] [INFO] Booting worker with pid: 589&lt;BR /&gt;[58c45b9xxw] /opt/conda/envs/mlflow-env/lib/python3.10/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:&lt;BR /&gt;[58c45b9xxw] * 'schema_extra' has been renamed to 'json_schema_extra'&lt;BR /&gt;[58c45b9xxw] warnings.warn(message, UserWarning)&lt;BR /&gt;[58c45b9xxw] An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`.&lt;/DIV&gt;&lt;/DIV&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 09 Feb 2024 14:46:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/59795#M2985</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-02-09T14:46:23Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/60445#M3007</link>
      <description>&lt;P&gt;I tried to run in another cell something like this&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;!&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;opt&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;conda&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;envs&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;mlflow&lt;/SPAN&gt;&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt;env&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;bin&lt;/SPAN&gt;&lt;SPAN&gt;/&lt;/SPAN&gt;&lt;SPAN&gt;gunicorn configure&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;and shown the error:&lt;BR /&gt;/bin/bash: line 1: /opt/conda/envs/mlflow-env/bin/gunicorn: No such file or directory&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Feb 2024 20:37:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/60445#M3007</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-02-16T20:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62635#M3072</link>
      <description>&lt;P&gt;Hi Guys, we encountered the same issue, do we have resolution to this&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2024 10:29:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62635#M3072</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-05T10:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62638#M3073</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79528"&gt;@marcelo2108&lt;/a&gt;&amp;nbsp;do you have some progress on this one. We encounter the same issue while deploying a RAG chatbot in databricks&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2024 11:02:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62638#M3073</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-05T11:02:25Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62676#M3074</link>
      <description>&lt;P&gt;Same issue&lt;/P&gt;&lt;P&gt;An error occurred while loading the model. You haven't configured the CLI yet! Please configure by entering `/opt/conda/envs/mlflow-env/bin/gunicorn configure`&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2024 16:22:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62676#M3074</guid>
      <dc:creator>DataWrangler</dc:creator>
      <dc:date>2024-03-05T16:22:12Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62677#M3075</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/101712"&gt;@SwaggerP&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/41942"&gt;@DataWrangler&lt;/a&gt;&amp;nbsp;. Yes I´m with the same issue still and without a solution so far.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2024 16:24:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62677#M3075</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-03-05T16:24:48Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62866#M3080</link>
      <description>&lt;P&gt;I was having the same issue deploying a custom pyfunc model, eventually found a fix by deploying one function at a time to isolate where the issues was. Mine was caused by the vector search component - I was using self-managed embeddings, and it was the initialising of the embedding client and the vector search client `&lt;SPAN&gt;VectorSearchClient&lt;/SPAN&gt;&lt;SPAN&gt;()` in the load_context() that was causing this issue. Moving the initialisation of all clients to within the functions they were used in solved for me, good luck with your models.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 11:02:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62866#M3080</guid>
      <dc:creator>bengidlow</dc:creator>
      <dc:date>2024-03-07T11:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62901#M3081</link>
      <description>&lt;P&gt;thanks for the hint&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/101881"&gt;@bengidlow&lt;/a&gt;&amp;nbsp;, however this did not work for me. I'm just using the dbdemo so i'm confused why it doesnt just work&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 13:13:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62901#M3081</guid>
      <dc:creator>DataWrangler</dc:creator>
      <dc:date>2024-03-07T13:13:02Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62907#M3082</link>
      <description>&lt;P&gt;Same with me. Just using dbdemo. It doesnt work&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 14:59:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/62907#M3082</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-07T14:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63052#M3085</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;any help would be greatly appreciated... seems like dbdemos should just work&lt;/P&gt;</description>
      <pubDate>Fri, 08 Mar 2024 12:46:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63052#M3085</guid>
      <dc:creator>DataWrangler</dc:creator>
      <dc:date>2024-03-08T12:46:20Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63054#M3086</link>
      <description>&lt;P&gt;issue seems to be in the get_retriever() function at&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; vectorstore &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;DatabricksVectorSearch&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; vs_index, &lt;/SPAN&gt;&lt;SPAN&gt;text_column&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"content"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;embedding_model, &lt;/SPAN&gt;&lt;SPAN&gt;columns&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;SPAN&gt;"url"&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 08 Mar 2024 13:19:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63054#M3086</guid>
      <dc:creator>DataWrangler</dc:creator>
      <dc:date>2024-03-08T13:19:29Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63116#M3090</link>
      <description>&lt;P&gt;I tried enhancing the said function. Even declaring imports inside it. Still error&lt;/P&gt;</description>
      <pubDate>Sat, 09 Mar 2024 07:57:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63116#M3090</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-09T07:57:58Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63124#M3092</link>
      <description>&lt;P&gt;All, I've fixed the error. Though, to be honest, I'm not exactly sure what ended up doing it. I was trying to do it systematically, but I lost track. None the less, I hope my below code helps.&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/101712"&gt;@SwaggerP&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79528"&gt;@marcelo2108&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def get_retriever(persist_dir: str = None):
    import gunicorn
    from databricks.vector_search.client import VectorSearchClient
    from langchain_community.vectorstores import DatabricksVectorSearch
    from langchain_community.embeddings import DatabricksEmbeddings
    from langchain_community.chat_models import ChatDatabricks
    from langchain.chains import RetrievalQA
    import logging

    import traceback
    logging.basicConfig(filename='error.log', level=logging.DEBUG)
    
    
    print('libraries loaded')
    # token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
    embedding_model = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

    print('initialized embedding_model')

    #Get the vector search index
    vsc = VectorSearchClient(workspace_url=os.environ["DATABRICKS_HOST"], 
     personal_access_token=os.environ["DATABRICKS_TOKEN"],
     disable_notice=True                  
    )
    
    print('initialized VectorSearchClient')
    
    vs_index = vsc.get_index(
        endpoint_name='vectorsearch',
        index_name=vsIndexName
    )

    print('initialized vs_index')

    # Create the retriever
    try:
        print('trying to initialize vectorstore')

        vectorstore = DatabricksVectorSearch(
            vs_index, text_column="content", embedding=embedding_model, columns=["url"]
        )

        retriever = vectorstore.as_retriever(search_kwargs={'k': 4})

        print('initialized vectorstore')

        return  retriever
    except BaseException as e:
        print("An error occurred:", str(e))
        traceback.print_exc()


from langchain.vectorstores import DatabricksVectorSearch
import os
from langchain_community.chat_models import ChatDatabricks
from langchain.chains import RetrievalQA
from langchain import hub
prompt = hub.pull("rlm/rag-prompt", api_url="https://api.hub.langchain.com")

retriever = get_retriever()

chat_model = ChatDatabricks(endpoint="databricks-llama-2-70b-chat")


qa_chain = RetrievalQA.from_chain_type(
    chat_model,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)


import langchain
from mlflow.models import infer_signature



with mlflow.start_run(run_name=runName) as run:
    question = "qiestopm jere?"
    result = qa_chain({"query": question})
    signature = infer_signature(result['query'], result['result'])

    model_info = mlflow.langchain.log_model(
        qa_chain,
        loader_fn=get_retriever,  # Load the retriever with DATABRICKS_TOKEN env as secret (for authentication).
        artifact_path="chain",
        registered_model_name=fq_model_name,
        pip_requirements=[
            "mlflow",
            "langchain",
            "langchain_community",
            "databricks-vectorsearch",
            "pydantic==2.5.2 --no-binary pydantic",
            "cloudpickle",
            "langchainhub"
        ],
        input_example=result,
        signature=signature,
    )


import urllib
import json
import mlflow
import requests
import time
from mlflow.tracking import MlflowClient


client = MlflowClient()
model_name = f"{fq_model_name}"
serving_endpoint_name = servingName



#TODO: use the sdk once model serving is available.
serving_client = EndpointApiClient()


auto_capture_config = {
    "catalog_name": catalog,
    "schema_name": db,
    "table_name_prefix": serving_endpoint_name
    } 
environment_vars={
  "DATABRICKS_HOST" : "{{secrets/azurekeyvault/hostsecrethere}}",
  "DATABRICKS_TOKEN" : "{{secrets/azurekeyvault/pathere}}"
}

serving_client.create_endpoint_if_not_exists(serving_endpoint_name, 
                                             model_name=model_name.lower(), 
                                             model_version = 33, 
                                             workload_size="Small", 
                                             scale_to_zero_enabled=True, 
                                             wait_start = True, 
                                             auto_capture_config=auto_capture_config, 
                                             environment_vars=environment_vars
                                             )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Mar 2024 17:09:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63124#M3092</guid>
      <dc:creator>DataWrangler</dc:creator>
      <dc:date>2024-03-09T17:09:56Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63682#M3111</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/41942"&gt;@DataWrangler&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Mine is now successfully deployed, I am now facing this 'Forbidden for url' issue whenever I query the endpoint.&lt;BR /&gt;In our workspace, PAT are not allowed hence we need to use a service principal.&lt;/P&gt;&lt;P&gt;Probable cause is the service principal?&lt;/P&gt;&lt;P&gt;03 Client Error: Forbidden for url: /serving-endpoints/databricks-mixtral-8x7b-instruct/invocations&lt;/P&gt;</description>
      <pubDate>Thu, 14 Mar 2024 11:48:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63682#M3111</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-14T11:48:45Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63744#M3116</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/101712"&gt;@SwaggerP&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/41942"&gt;@DataWrangler&lt;/a&gt;&amp;nbsp; Any solution?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Mar 2024 20:23:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63744#M3116</guid>
      <dc:creator>ADS1</dc:creator>
      <dc:date>2024-03-14T20:23:33Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63751#M3117</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/41942"&gt;@DataWrangler&lt;/a&gt;&amp;nbsp;Thanks your valuable inputs. I have a question about your code&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt; embedding_model = DatabricksEmbeddings(endpoint="databricks-bge-large-en")&lt;/PRE&gt;&lt;P&gt;You need UC enabled right ? In case that I don´t have UC enabled. Could I use HuggingFace Embeddings instead with DatabricksVectorSearch ?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 01:12:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/63751#M3117</guid>
      <dc:creator>marcelo2108</dc:creator>
      <dc:date>2024-03-15T01:12:28Z</dc:date>
    </item>
    <item>
      <title>Re: Problem when serving a langchain model on Databricks</title>
      <link>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/64675#M3154</link>
      <description>&lt;P&gt;bge is part of foundation models, no need for unity catalog for this. Mine is also deployed successfully.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 15:24:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/problem-when-serving-a-langchain-model-on-databricks/m-p/64675#M3154</guid>
      <dc:creator>SwaggerP</dc:creator>
      <dc:date>2024-03-26T15:24:25Z</dc:date>
    </item>
  </channel>
</rss>

