cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks

meetiasha
New Contributor II

Issue with current documentation:

I wish to perform advanced RAG using langchain, in Databricks. In the documentation, they tell how to use the vector endpoint url, and index stored in catalogs. But I could not find any advanced RAG algos that are easily implemented in Databricks. Can you please advise me on a step-wise documentation on how I can proceed with this task?

I would appreciate if we can implement advanced RAG with minimum reliance on catalogs and endpoints, but rather langchain-exclusive tools that make stuff easier to do

Idea or request for content:

Seperate sections- each with a advanced rag technique, and how to use that in Databricks with minimum reliance on catalogs and endpoints, but rather langchain-exclusive tools that make stuff easier to do.

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @meetiasha , yes—there’s a gap between Databricks’ basic “vector endpoint + catalog index” examples and truly advanced RAG, so below is a step‑wise, LangChain‑first playbook you can run entirely on Databricks notebooks with local vector stores (FAISS/Chroma), secrets, and LCEL—no Unity Catalog tables or Vector Search endpoints required.

Minimal setup on Databricks
- Install packages in a notebook cell: %pip install langchain langchain-openai langchain-text-splitters langchain-chroma chromadb faiss-cpu, which uses notebook‑scoped libraries that don’t affect the whole cluster.
- Persist vector stores locally on DBFS (for example, persist_directory="/dbfs/FileStore/rag/chroma" or FAISS index files under /dbfs) and manage paths with dbutils.fs.
- Store API keys (OpenAI, Cohere, etc.) in Databricks Secrets and load them at runtime via dbutils.secrets.get to avoid hardcoding credentials.

```python
# Databricks notebook cell
# %pip install langchain langchain-openai langchain-text-splitters langchain-chroma chromadb faiss-cpu

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

openai_key = dbutils.secrets.get("my-scope", "OPENAI_API_KEY")
emb = OpenAIEmbeddings(api_key=openai_key)
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)

# Persist a Chroma store under DBFS (no catalogs/endpoints)
vs = Chroma(collection_name="docs", embedding_function=emb,
persist_directory="/dbfs/FileStore/rag/chroma")
```

Multi-Query Retrieval (query expansion)
- MultiQueryRetriever uses an LLM to reformulate a single question into several variants, broadening recall and reducing single‑query blind spots.
- This integrates directly with any vector store retriever, and the chain is composed with LCEL for clean, production‑ready orchestration.

```python
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(api_key=openai_key, model="gpt-4o-mini", temperature=0)
retriever = vs.as_retriever(search_kwargs={"k": 6})

mqr = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

prompt = ChatPromptTemplate.from_messages([
("system", "Answer using only the provided context."),
("human", "Context:\n{context}\n\nQuestion: {question}")
])

def format_docs(docs): return "\n\n".join(d.page_content for d in docs)

rag = ({"context": mqr | format_docs, "question": RunnablePassthrough()}
| prompt | llm | StrOutputParser())

print(rag.invoke("What changed in the latest policy?"))
```

Parent–Child (ParentDocumentRetriever)
- ParentDocumentRetriever stores small chunks for retrieval but returns their larger parent document to preserve context and cut fragment errors.
- Pair it with Chroma/FAISS for storage and an in‑memory or simple key‑value store for parent documents; it remains local and catalog‑free.

```python
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
store = InMemoryStore()

parent_ret = ParentDocumentRetriever(
vectorstore=vs,
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
# parent_ret.add_documents(docs) # run once to index
```


Contextual Compression with reranking
- ContextualCompressionRetriever runs a reranker or compressor over initially retrieved documents to keep only the most answer‑bearing snippets.
- You can use an LLM‑based or third‑party reranker (for example, Cohere or Contextual AI) to substantially improve precision at low k.

```python
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank

cohere_key = dbutils.secrets.get("my-scope", "COHERE_API_KEY")
compressor = CohereRerank(api_key=cohere_key, top_n=6)
base_ret = vs.as_retriever(search_kwargs={"k": 20})
cc_ret = ContextualCompressionRetriever(base_retriever=base_ret, base_compressor=compressor)
```


HyDE (Hypothetical Document Embeddings)
- HyDE uses an LLM to synthesize a hypothetical document for the user’s query, embeds that synthetic text, and searches with that embedding to boost recall in sparse or noisy corpora.
- In LangChain, wrap an LLM and embeddings with HypotheticalDocumentEmbedder and use it in place of a standard embedding function to build or query a local vector store.

```python
from langchain.chains import HypotheticalDocumentEmbedder
hyde_emb = HypotheticalDocumentEmbedder.from_llm(llm, emb, "web_search")
# Use hyde_emb with your vector store (e.g., for query embeddings or indexing variants)
```


Self‑Query Retriever (metadata‑aware filtering)
- SelfQueryRetriever lets an LLM translate natural‑language filters (time ranges, authors, sections) into vector‑store search parameters, improving retrieval control without brittle manual parsing.
- It’s ideal when documents have rich metadata and you want free‑form queries to map to filters like tags or date constraints.

```python
from langchain.retrievers.self_query.base import SelfQueryRetriever
sqr = SelfQueryRetriever(vectorstore=vs, query_constructor=llm, use_original_query=True)
docs = sqr.get_relevant_documents("security changes in Q3 2024, only policy PDFs")
```


Multi‑Vector and summary vectors
- MultiVectorRetriever stores multiple embeddings per document (for example, raw chunk + summary + title) to expand matches and strengthen recall on terse queries.
- This pairs well with compression or reranking so the final context window remains concise despite broader initial matches.

```python
from langchain.retrievers.multi_vector import MultiVectorRetriever
mvr = MultiVectorRetriever.from_documents(documents=docs, embeddings=emb, id_key="doc_id")
```


Retriever ensembles and routing
- An ensemble can combine lexical (BM25/TF‑IDF) and vector retrievers and weight their scores, often outperforming any single retriever on heterogeneous data.
- With LCEL, dynamically route to different retrievers based on the query intent, then merge results before compression and generation.

```python
from langchain.retrievers.ensemble import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_texts([d.page_content for d in docs])
vector_ret = vs.as_retriever(search_kwargs={"k": 8})
ensemble = EnsembleRetriever(retrievers=[bm25, vector_ret], weights=[0.4, 0.6])
```


Graph‑augmented RAG (optional)
- For relationship‑heavy domains, add a knowledge graph (for example, Neo4j) and use graph queries alongside vector search to ground answers in entities and edges.
- LangChain provides advanced RAG templates with Neo4j that you can adapt to your DBFS‑persisted embeddings workflow.

```python
# See neo4j-advanced-rag template in LangChain; run graph retrieval then fuse with vector hits
```


Local vector stores on DBFS (FAISS/Chroma)
- FAISS and Chroma both run fully local, persist to files, and avoid reliance on Databricks Vector Search or Unity Catalog, fitting the “LangChain‑exclusive” requirement.
- Use DBFS paths for persistence so jobs and notebooks can share indexes predictably across runs without external services.

```python
# Persist FAISS index locally (example)
from langchain_community.vectorstores import FAISS
faiss_vs = FAISS.from_texts([d.page_content for d in docs], embedding=emb)
faiss_vs.save_local("/dbfs/FileStore/rag/faiss_index")
```


Orchestrate with LCEL
- LangChain Expression Language (LCEL) composes retrievers, prompts, LLMs, and parsers into a single, efficient graph that’s easy to test and deploy on jobs.
- Build a standard pattern: {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser().

```python
from langchain_core.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# rag chain from earlier sections already follows LCEL; just reuse it in jobs
```


Putting it together (recommended baseline)
- Start with ParentDocumentRetriever + MultiQueryRetriever to improve recall while returning coherent parent docs, then add ContextualCompressionRetriever with a reranker to tighten final context.
- Persist your Chroma/FAISS indexes to DBFS, load secrets with dbutils.secrets, and manage packages with %pip to keep everything self‑contained and independent of catalogs and endpoints.

 

Hoping this helps you.

Cheers, Louis.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now