- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2025 04:20 AM - edited 08-04-2025 04:23 AM
What is RAG?
RAG (Retrieval-Augmented Generation) on Databricks refers to building and running AI applications that combine:
Retrieval systems (like vector databases or search over documents)
Generative AI models (such as LLMs like GPT)
within the Databricks platform
RAG on Databricks allows you to
Store and index data (e.g., using Delta Lake or vector search)
Retrieve relevant information for a user query
Feed that into an LLM to generate accurate, context-aware responses
Key Components for RAG on Databricks:
Databricks Vector Search for fast retrieval
MLflow for model tracking and deployment
Foundational Models (like Dolly or external LLMs)
Databricks Notebooks or Lakehouse AI Agents for orchestration
Unity Catalog for governance and security.
How to build a RAG evaluation pipeline using MLflow evaluation functions?
Prerequisites
Before you start, ensure you meet the following requirements:
- Use Databricks Runtime 15.4.x-cpu-ml-scala2.12.
Install the required libraries by running the following command
%pip install -U -qq databricks-vectorsearch langchain==0.3.7 flashrank langchain-databricks PyPDF2
dbutils.library.restartPython()Focusing on creating a complete RAG pipeline.
- A user asks a question.
- The question is sent to a serverless chatbot RAG endpoint.
- The endpoint computes embeddings and retrieves relevant documents using the Vector Search Index.
- The retrieved documents are used to enrich the prompt.
- The enriched prompt is sent to the Foundation Model endpoint for a response.
- The system displays the output to the user.
Task 1: Setup the Retriever Component
The retriever is responsible for fetching relevant documents from the Vector Search Index. Follow these steps:
Define the Components
vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix + str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")
vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_self_managed_vs_index"
from databricks.vector_search.client import VectorSearchClient
from langchain_databricks import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequestSet Up the Retriever
Define the retriever to return 3 relevant documents:
def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):
def retrieve(query, k: int=3):
if isinstance(query, dict):
# Code to process query and return resultsTest the retriever with a sample prompt.
Task 2: Setup the Foundation Model
Use a Foundation Model like llama-3.1 to generate responses.
Define and Test the Model
from langchain_databricks import ChatDatabricks
chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens=275)
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")Task 3: Assemble the Complete RAG Solution
Integrate the retriever and foundation model into a unified pipeline.
Define the Prompt Template
from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate
TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and its impact on human life. If the question is not related to these topics, kindly decline to answer. If you don't know the answer, just say so."""Create the Chain
chain = create_retrieval_chain(
retriever=get_retriever(),
prompt=PromptTemplate(input_variables=["question"], template=TEMPLATE),
llm=chat_model
)
question = {"input": "How does Generative AI impact humans?"}
answer = chain.invoke(question)
print(answer)Task 4: Save the Model to Model Registry in Unity Catalog
Register the Model
from mlflow.models import infer_signature
import mlflow
mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_demo4"
with mlflow.start_run(run_name="rag_app_demo4") as run:
signature = infer_signature(question, answer)
mlflow.log_param("model_type", "RAG")
mlflow.pyfunc.log_model(
artifact_path="model",
python_model=chain,
signature=signature,
)
mlflow.register_model(
model_uri=f"runs:/{run.info.run_id}/model",
name=model_name
)Clean Up Resources
Delete all resources created during this course to avoid unnecessary costs.
Conclusion
In this article, we demonstrated how to construct a comprehensive RAG application using Databricks. We:
- Assembled key components like the Vector Search retriever and Foundation Model.
- Created a pipeline to retrieve relevant documents and generate enriched responses.
- Evaluated the performance using MLflow.
- Registered the RAG application in Unity Catalog for production use.
#RAG #GenAI
- Labels:
-
GenAI and LLMs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2025 07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2025 08:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2025 04:50 PM
Thanks
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2025 11:39 AM
Thanks for sharing @snehamore811