Prerequisites

snehamore811 · ‎08-04-2025

What is RAG?

RAG (Retrieval-Augmented Generation) on Databricks refers to building and running AI applications that combine:

Retrieval systems (like vector databases or search over documents)
Generative AI models (such as LLMs like GPT)

within the Databricks platform

RAG on Databricks allows you to

Store and index data (e.g., using Delta Lake or vector search)
Retrieve relevant information for a user query
Feed that into an LLM to generate accurate, context-aware responses

Key Components for RAG on Databricks:

Databricks Vector Search for fast retrieval
MLflow for model tracking and deployment
Foundational Models (like Dolly or external LLMs)
Databricks Notebooks or Lakehouse AI Agents for orchestration
Unity Catalog for governance and security.

How to build a RAG evaluation pipeline using MLflow evaluation functions?

Prerequisites

Before you start, ensure you meet the following requirements:

Use Databricks Runtime 15.4.x-cpu-ml-scala2.12.

Install the required libraries by running the following command

%pip install -U -qq databricks-vectorsearch langchain==0.3.7 flashrank langchain-databricks PyPDF2

dbutils.library.restartPython()

Focusing on creating a complete RAG pipeline.

A user asks a question.
The question is sent to a serverless chatbot RAG endpoint.
The endpoint computes embeddings and retrieves relevant documents using the Vector Search Index.
The retrieved documents are used to enrich the prompt.
The enriched prompt is sent to the Foundation Model endpoint for a response.
The system displays the output to the user.

Task 1: Setup the Retriever Component

The retriever is responsible for fetching relevant documents from the Vector Search Index. Follow these steps:

Define the Components

vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix + str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_self_managed_vs_index"
from databricks.vector_search.client import VectorSearchClient
from langchain_databricks import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest

Set Up the Retriever

Define the retriever to return 3 relevant documents:

def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):
    def retrieve(query, k: int=3):
        if isinstance(query, dict):
            # Code to process query and return results

Test the retriever with a sample prompt.

Task 2: Setup the Foundation Model

Use a Foundation Model like llama-3.1 to generate responses.

Define and Test the Model

from langchain_databricks import ChatDatabricks

chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens=275)
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")

Task 3: Assemble the Complete RAG Solution

Integrate the retriever and foundation model into a unified pipeline.

Define the Prompt Template

from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate

TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and its impact on human life. If the question is not related to these topics, kindly decline to answer. If you don't know the answer, just say so."""

Create the Chain

chain = create_retrieval_chain(
    retriever=get_retriever(),
    prompt=PromptTemplate(input_variables=["question"], template=TEMPLATE),
    llm=chat_model
)

question = {"input": "How does Generative AI impact humans?"}
answer = chain.invoke(question)
print(answer)

Task 4: Save the Model to Model Registry in Unity Catalog

Register the Model

from mlflow.models import infer_signature
import mlflow

mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_demo4"

with mlflow.start_run(run_name="rag_app_demo4") as run:
    signature = infer_signature(question, answer)
    mlflow.log_param("model_type", "RAG")
    mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=chain,
        signature=signature,
    )
    mlflow.register_model(
        model_uri=f"runs:/{run.info.run_id}/model",
        name=model_name
    )

Clean Up Resources

Delete all resources created during this course to avoid unnecessary costs.

Conclusion

In this article, we demonstrated how to construct a comprehensive RAG application using Databricks. We:

Assembled key components like the Vector Search retriever and Foundation Model.
Created a pipeline to retrieve relevant documents and generate enriched responses.
Evaluated the performance using MLflow.
Registered the RAG application in Unity Catalog for production use.

#RAG #GenAI

CT_snehamore · ‎08-05-2025

hey thanks for sharing

View solution in original post

snehamore811 · ‎08-05-2025

Thanks

View solution in original post

WiliamRosa · ‎08-19-2025

Thanks

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

szymon_dybczak · ‎08-23-2025

Thanks for sharing @snehamore811

Databricks for RAG: Build, Run, Evaluate

within the Databricks platform

Key Components for RAG on Databricks:

Prerequisites

Task 1: Setup the Retriever Component

Define the Components

Set Up the Retriever

Task 2: Setup the Foundation Model

Define and Test the Model

Task 3: Assemble the Complete RAG Solution

Define the Prompt Template

Create the Chain

Task 4: Save the Model to Model Registry in Unity Catalog

Register the Model

Clean Up Resources

Conclusion