โ08-04-2025 04:20 AM - edited โ08-04-2025 04:23 AM
What is RAG?
RAG (Retrieval-Augmented Generation) on Databricks refers to building and running AI applications that combine:
Retrieval systems (like vector databases or search over documents)
Generative AI models (such as LLMs like GPT)
RAG on Databricks allows you to
Store and index data (e.g., using Delta Lake or vector search)
Retrieve relevant information for a user query
Feed that into an LLM to generate accurate, context-aware responses
Databricks Vector Search for fast retrieval
MLflow for model tracking and deployment
Foundational Models (like Dolly or external LLMs)
Databricks Notebooks or Lakehouse AI Agents for orchestration
Unity Catalog for governance and security.
How to build a RAG evaluation pipeline using MLflow evaluation functions?
Before you start, ensure you meet the following requirements:
Install the required libraries by running the following command
%pip install -U -qq databricks-vectorsearch langchain==0.3.7 flashrank langchain-databricks PyPDF2
dbutils.library.restartPython()
Focusing on creating a complete RAG pipeline.
The retriever is responsible for fetching relevant documents from the Vector Search Index. Follow these steps:
vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix + str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")
vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_self_managed_vs_index"
from databricks.vector_search.client import VectorSearchClient
from langchain_databricks import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest
Define the retriever to return 3 relevant documents:
def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):
def retrieve(query, k: int=3):
if isinstance(query, dict):
# Code to process query and return results
Test the retriever with a sample prompt.
Use a Foundation Model like llama-3.1 to generate responses.
from langchain_databricks import ChatDatabricks
chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens=275)
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")
Integrate the retriever and foundation model into a unified pipeline.
from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate
TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and its impact on human life. If the question is not related to these topics, kindly decline to answer. If you don't know the answer, just say so."""
chain = create_retrieval_chain(
retriever=get_retriever(),
prompt=PromptTemplate(input_variables=["question"], template=TEMPLATE),
llm=chat_model
)
question = {"input": "How does Generative AI impact humans?"}
answer = chain.invoke(question)
print(answer)
from mlflow.models import infer_signature
import mlflow
mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_demo4"
with mlflow.start_run(run_name="rag_app_demo4") as run:
signature = infer_signature(question, answer)
mlflow.log_param("model_type", "RAG")
mlflow.pyfunc.log_model(
artifact_path="model",
python_model=chain,
signature=signature,
)
mlflow.register_model(
model_uri=f"runs:/{run.info.run_id}/model",
name=model_name
)
Delete all resources created during this course to avoid unnecessary costs.
In this article, we demonstrated how to construct a comprehensive RAG application using Databricks. We:
#RAG #GenAI
a month ago
a month ago
a month ago
hey thanks for sharing
a month ago
Thanks
2 weeks ago
Thanks
2 weeks ago
Thanks for sharing @snehamore811
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now