Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks

meetiasha — Tue, 07 Jan 2025 07:10:35 GMT

Issue with current documentation:

I wish to perform advanced RAG using langchain, in Databricks. In the documentation, they tell how to use the vector endpoint url, and index stored in catalogs. But I could not find any advanced RAG algos that are easily implemented in Databricks. Can you please advise me on a step-wise documentation on how I can proceed with this task?

I would appreciate if we can implement advanced RAG with minimum reliance on catalogs and endpoints, but rather langchain-exclusive tools that make stuff easier to do

Idea or request for content:

Seperate sections- each with a advanced rag technique, and how to use that in Databricks with minimum reliance on catalogs and endpoints, but rather langchain-exclusive tools that make stuff easier to do.

Re: Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks

Louis_Frolio — Sat, 08 Nov 2025 22:16:52 GMT

Greetings @meetiasha , yes—there’s a gap between Databricks’ basic “vector endpoint + catalog index” examples and truly advanced RAG, so below is a step‑wise, LangChain‑first playbook you can run entirely on Databricks notebooks with local vector stores (FAISS/Chroma), secrets, and LCEL—no Unity Catalog tables or Vector Search endpoints required.

Minimal setup on Databricks
- Install packages in a notebook cell: %pip install langchain langchain-openai langchain-text-splitters langchain-chroma chromadb faiss-cpu, which uses notebook‑scoped libraries that don’t affect the whole cluster.
- Persist vector stores locally on DBFS (for example, persist_directory="/dbfs/FileStore/rag/chroma" or FAISS index files under /dbfs) and manage paths with dbutils.fs.
- Store API keys (OpenAI, Cohere, etc.) in Databricks Secrets and load them at runtime via dbutils.secrets.get to avoid hardcoding credentials.

```python
# Databricks notebook cell
# %pip install langchain langchain-openai langchain-text-splitters langchain-chroma chromadb faiss-cpu

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

openai_key = dbutils.secrets.get("my-scope", "OPENAI_API_KEY")
emb = OpenAIEmbeddings(api_key=openai_key)
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)

# Persist a Chroma store under DBFS (no catalogs/endpoints)
vs = Chroma(collection_name="docs", embedding_function=emb,
persist_directory="/dbfs/FileStore/rag/chroma")
```

Multi-Query Retrieval (query expansion)
- MultiQueryRetriever uses an LLM to reformulate a single question into several variants, broadening recall and reducing single‑query blind spots.
- This integrates directly with any vector store retriever, and the chain is composed with LCEL for clean, production‑ready orchestration.

```python
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(api_key=openai_key, model="gpt-4o-mini", temperature=0)
retriever = vs.as_retriever(search_kwargs={"k": 6})

mqr = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

prompt = ChatPromptTemplate.from_messages([
("system", "Answer using only the provided context."),
("human", "Context:\n{context}\n\nQuestion: {question}")
])

def format_docs(docs): return "\n\n".join(d.page_content for d in docs)

rag = ({"context": mqr | format_docs, "question": RunnablePassthrough()}
| prompt | llm | StrOutputParser())

print(rag.invoke("What changed in the latest policy?"))
```

Parent–Child (ParentDocumentRetriever)
- ParentDocumentRetriever stores small chunks for retrieval but returns their larger parent document to preserve context and cut fragment errors.
- Pair it with Chroma/FAISS for storage and an in‑memory or simple key‑value store for parent documents; it remains local and catalog‑free.

```python
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
store = InMemoryStore()

parent_ret = ParentDocumentRetriever(
vectorstore=vs,
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
# parent_ret.add_documents(docs) # run once to index
```

Contextual Compression with reranking
- ContextualCompressionRetriever runs a reranker or compressor over initially retrieved documents to keep only the most answer‑bearing snippets.
- You can use an LLM‑based or third‑party reranker (for example, Cohere or Contextual AI) to substantially improve precision at low k.

```python
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank

cohere_key = dbutils.secrets.get("my-scope", "COHERE_API_KEY")
compressor = CohereRerank(api_key=cohere_key, top_n=6)
base_ret = vs.as_retriever(search_kwargs={"k": 20})
cc_ret = ContextualCompressionRetriever(base_retriever=base_ret, base_compressor=compressor)
```

HyDE (Hypothetical Document Embeddings)
- HyDE uses an LLM to synthesize a hypothetical document for the user’s query, embeds that synthetic text, and searches with that embedding to boost recall in sparse or noisy corpora.
- In LangChain, wrap an LLM and embeddings with HypotheticalDocumentEmbedder and use it in place of a standard embedding function to build or query a local vector store.

```python
from langchain.chains import HypotheticalDocumentEmbedder
hyde_emb = HypotheticalDocumentEmbedder.from_llm(llm, emb, "web_search")
# Use hyde_emb with your vector store (e.g., for query embeddings or indexing variants)
```

Self‑Query Retriever (metadata‑aware filtering)
- SelfQueryRetriever lets an LLM translate natural‑language filters (time ranges, authors, sections) into vector‑store search parameters, improving retrieval control without brittle manual parsing.
- It’s ideal when documents have rich metadata and you want free‑form queries to map to filters like tags or date constraints.

```python
from langchain.retrievers.self_query.base import SelfQueryRetriever
sqr = SelfQueryRetriever(vectorstore=vs, query_constructor=llm, use_original_query=True)
docs = sqr.get_relevant_documents("security changes in Q3 2024, only policy PDFs")
```

Multi‑Vector and summary vectors
- MultiVectorRetriever stores multiple embeddings per document (for example, raw chunk + summary + title) to expand matches and strengthen recall on terse queries.
- This pairs well with compression or reranking so the final context window remains concise despite broader initial matches.

```python
from langchain.retrievers.multi_vector import MultiVectorRetriever
mvr = MultiVectorRetriever.from_documents(documents=docs, embeddings=emb, id_key="doc_id")
```

Retriever ensembles and routing
- An ensemble can combine lexical (BM25/TF‑IDF) and vector retrievers and weight their scores, often outperforming any single retriever on heterogeneous data.
- With LCEL, dynamically route to different retrievers based on the query intent, then merge results before compression and generation.

```python
from langchain.retrievers.ensemble import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_texts([d.page_content for d in docs])
vector_ret = vs.as_retriever(search_kwargs={"k": 8})
ensemble = EnsembleRetriever(retrievers=[bm25, vector_ret], weights=[0.4, 0.6])
```

Graph‑augmented RAG (optional)
- For relationship‑heavy domains, add a knowledge graph (for example, Neo4j) and use graph queries alongside vector search to ground answers in entities and edges.
- LangChain provides advanced RAG templates with Neo4j that you can adapt to your DBFS‑persisted embeddings workflow.

```python
# See neo4j-advanced-rag template in LangChain; run graph retrieval then fuse with vector hits
```

Local vector stores on DBFS (FAISS/Chroma)
- FAISS and Chroma both run fully local, persist to files, and avoid reliance on Databricks Vector Search or Unity Catalog, fitting the “LangChain‑exclusive” requirement.
- Use DBFS paths for persistence so jobs and notebooks can share indexes predictably across runs without external services.

```python
# Persist FAISS index locally (example)
from langchain_community.vectorstores import FAISS
faiss_vs = FAISS.from_texts([d.page_content for d in docs], embedding=emb)
faiss_vs.save_local("/dbfs/FileStore/rag/faiss_index")
```

Orchestrate with LCEL
- LangChain Expression Language (LCEL) composes retrievers, prompts, LLMs, and parsers into a single, efficient graph that’s easy to test and deploy on jobs.
- Build a standard pattern: {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser().

```python
from langchain_core.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# rag chain from earlier sections already follows LCEL; just reuse it in jobs
```

Putting it together (recommended baseline)
- Start with ParentDocumentRetriever + MultiQueryRetriever to improve recall while returning coherent parent docs, then add ContextualCompressionRetriever with a reranker to tighten final context.
- Persist your Chroma/FAISS indexes to DBFS, load secrets with dbutils.secrets, and manage packages with %pip to keep everything self‑contained and independent of catalogs and endpoints.

Hoping this helps you.

Cheers, Louis.

topic Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks in Generative AI

Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks

Issue with current documentation:

Idea or request for content:

Re: Advanced RAG Retrieval (Reranking, Hierarchical, etc) in Databricks