cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks for RAG: Build, Run, Evaluate

snehamore811
New Contributor III

What is RAG?

RAG (Retrieval-Augmented Generation) on Databricks refers to building and running AI applications that combine:

  • Retrieval systems (like vector databases or search over documents)

  • Generative AI models (such as LLMs like GPT)

within the Databricks platform 

RAG on Databricks allows you to

  • Store and index data (e.g., using Delta Lake or vector search)

  • Retrieve relevant information for a user query

  • Feed that into an LLM to generate accurate, context-aware responses

Key Components for RAG on Databricks:

  • Databricks Vector Search for fast retrieval

  • MLflow for model tracking and deployment

  • Foundational Models (like Dolly or external LLMs)

  • Databricks Notebooks or Lakehouse AI Agents for orchestration

  • Unity Catalog for governance and security.

How to build a RAG evaluation pipeline using MLflow evaluation functions?

Prerequisites

Before you start, ensure you meet the following requirements:

  • Use Databricks Runtime 15.4.x-cpu-ml-scala2.12.

Install the required libraries by running the following command

%pip install -U -qq databricks-vectorsearch langchain==0.3.7 flashrank langchain-databricks PyPDF2

dbutils.library.restartPython()

Focusing on creating a complete RAG pipeline.

  1. A user asks a question.
  2. The question is sent to a serverless chatbot RAG endpoint.
  3. The endpoint computes embeddings and retrieves relevant documents using the Vector Search Index.
  4. The retrieved documents are used to enrich the prompt.
  5. The enriched prompt is sent to the Foundation Model endpoint for a response.
  6. The system displays the output to the user.

Task 1: Setup the Retriever Component

The retriever is responsible for fetching relevant documents from the Vector Search Index. Follow these steps:

Define the Components

vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix + str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_self_managed_vs_index"
from databricks.vector_search.client import VectorSearchClient
from langchain_databricks import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest

Set Up the Retriever

Define the retriever to return 3 relevant documents:

def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):
    def retrieve(query, k: int=3):
        if isinstance(query, dict):
            # Code to process query and return results

Test the retriever with a sample prompt.

Task 2: Setup the Foundation Model

Use a Foundation Model like llama-3.1 to generate responses.

Define and Test the Model

from langchain_databricks import ChatDatabricks

chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens=275)
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")

Task 3: Assemble the Complete RAG Solution

Integrate the retriever and foundation model into a unified pipeline.

Define the Prompt Template

from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and its impact on human life. If the question is not related to these topics, kindly decline to answer. If you don't know the answer, just say so."""

Create the Chain

chain = create_retrieval_chain(
    retriever=get_retriever(),
    prompt=PromptTemplate(input_variables=["question"], template=TEMPLATE),
    llm=chat_model
)

question = {"input": "How does Generative AI impact humans?"}
answer = chain.invoke(question)
print(answer)

Task 4: Save the Model to Model Registry in Unity Catalog

Register the Model

from mlflow.models import infer_signature
import mlflow

mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_demo4"

with mlflow.start_run(run_name="rag_app_demo4") as run:
    signature = infer_signature(question, answer)
    mlflow.log_param("model_type", "RAG")
    mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=chain,
        signature=signature,
    )
    mlflow.register_model(
        model_uri=f"runs:/{run.info.run_id}/model",
        name=model_name
    )

Clean Up Resources

Delete all resources created during this course to avoid unnecessary costs.

Conclusion

In this article, we demonstrated how to construct a comprehensive RAG application using Databricks. We:

  • Assembled key components like the Vector Search retriever and Foundation Model.
  • Created a pipeline to retrieve relevant documents and generate enriched responses.
  • Evaluated the performance using MLflow.
  • Registered the RAG application in Unity Catalog for production use.

#RAG #GenAI

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

CT_snehamore
New Contributor III

hey thanks for sharing

View solution in original post

4 REPLIES 4

CT_snehamore
New Contributor III

hey thanks for sharing

Thanks 

WiliamRosa
New Contributor II

Thanks

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

szymon_dybczak
Esteemed Contributor III

Thanks for sharing @snehamore811