Imagine you’re running a company with multiple departments, like Finance, Legal, and HR. Each department has its own sensitive data—financial reports, legal contracts, and employee records—that need to stay private. Now, picture a star employee, a RAG chatbot, who can instantly retrieve the exact information you need, thanks to a smart system called Vector Search. But here’s the catch: not everyone should access everything. You wouldn’t want someone in Finance snooping through Legal’s contracts, right? That’s where Access Control Lists (ACL) come in. They act like security badges, ensuring only authorized team members can access their department’s data, keeping everything safe and sound.
In this blog post, we'll show how to implement ACL in your RAG chatbot using Mosaic AI Vector Search. Whether you're a seasoned developer or just starting your RAG journey, this guide will equip you with the knowledge and tools to secure your chatbot and protect sensitive data.
All the code samples referenced in this blog are available in the following github repo.
Before diving into ACL, let's quickly revisit the concept of RAG. If you're already familiar with RAG, feel free to skip ahead to the ACL section.
A RAG chatbot combines the power of large language models (LLMs) with your own data to provide accurate and contextually relevant responses. Instead of relying solely on the LLM's pre-trained knowledge, a RAG chatbot can access and process information from your documents, databases, or any other data source.
This makes it ideal for applications like:
Mosaic AI Vector Search offers powerful capabilities that make it easy to find relevant information within your data. Think of it like an online store recommending similar products based on your browsing history. Databricks allows you to "embed" your data into a vector space, making it easy to find similar items based on their meaning and context.
In Databricks’ definition, an Access Control List (ACL) is a set of permissions attached to objects within a system, specifying which users or system processes are granted access to those objects and what operations are allowed. ACLs are used to configure permissions to ensure that only authorized users can access specific data.
Let’s bring this into a company and department setting. Imagine your organization has departments like Finance and Legal, each handling sensitive and distinct data. Without ACLs, anyone might access critical financial reports or confidential employee documents, which could lead to breaches or errors. ACLs act as the gatekeeper, ensuring only Finance team members can access financial data and only HR personnel can access employee files. This not only protects sensitive information but also maintains clarity and efficiency by keeping each department focused on its own resources.
Access Control Lists (ACLs) are in our genes, reflecting the strong foundation of data governance in Databricks. Implementing RAG with ACLs offers a powerful way to manage data access while enhancing the capabilities of applications such as Q&A bots or recommendation systems. By combining the flexibility of vector search with robust metadata-based access control, you can ensure secure, role-specific data retrieval. This approach integrates seamlessly into existing workflows and enables fine-grained control over how data is accessed and used across different applications.
To achieve this, data is stored in a Delta table enriched with metadata columns, such as source or accessLevel, which define access rules. This Delta table is synced with the Databricks Vector Search engine, allowing queries to apply filters based on metadata. For instance, a "Public Q&A bot" may filter results to include only source="Public docs", while an "Internal Q&A bot" applies stricter criteria, such as accessLevel <= 2. These filters, passed through the Databricks Vector Search API or Python SDK, ensure users retrieve only the data they are authorized to access.
However, what if the parameter is dynamic rather than static? In such cases, Databricks provides powerful capabilities to handle runtime parameters, enabling filters to be tailored dynamically based on the user’s query or role. For instance, a user's department or access level can be passed as a dynamic filter to the vector search query at runtime, ensuring that access controls adapt seamlessly to the specific context of each request. This approach allows for real-time customization of access rules, making it ideal for scenarios where user permissions vary or depend on runtime inputs.
In fact, this dynamic filtering capability is a key highlight of this blog, as it forms the foundation of the major demonstration we will showcase—illustrating how combining static and dynamic parameters can enforce ACLs effectively in a RAG workflow.
By combining metadata filtering with vector search, Databricks provides a scalable, secure framework for managing data access in applications that need role-specific insights, without compromising performance or usability.
Let’s dive into the implementation! We’ll use the Databricks platform and its products, combined with LangChain, to build a RAG chatbot with robust ACL security. Don’t worry—we’ll break everything down into simple, step-by-step instructions with clear and relatable explanations.
Each step is accompanied by key code snippets to guide you through the process. For a deeper dive, explore our Github repo featuring the end-to-end implementation or refer to Databricks' documentation.
The first step in implementing ACLs is identifying the users in your system and the roles they belong to. Roles help group users based on their access needs, making it easier to manage permissions.
In our example, users are grouped into two departments: Finance and HR. A Finance Analyst from the Finance department might analyze budgets, while an HR Specialist from the HR department handles employee records, ensuring clear separation of responsibilities.
Tagging data with security labels allows you to classify it based on sensitivity or ownership. These labels act as identifiers, enabling ACLs to enforce access restrictions efficiently.
To simplify this blog post, we synthesize two PDFs. One for the Finance department, the other for the HR department. We store chunked text in a CSV file. We'll load this CSV file into a Delta table and use it as the source for our vector search index. In real-world scenarios, you might encounter diverse data sources and formats like PDFs, PPTs, and Confluence pages. This often requires writing custom data parsing and chunking techniques to process the data into a usable format, similar to the sample data in this example. For guidance on handling different file formats, refer to the Databricks AI cookbook.
For ACLs, we create a column department as metadata to be filtered. The financial department introduction is tagged as "Finance", the HR department introduction as “HR”. When a user queries data, metadata filtering ensures only the relevant department’s data is accessible.
The next step is to use those chunks to create vector search indexes. Vector search indexes enable fast, semantic retrieval of data by mapping information into a searchable space. When ACLs are added to these indexes, they enforce strict access controls, ensuring users can only retrieve data that aligns with their roles and associated metadata.
In our case, the original data includes the department column that specifies which department should have the access to a record, either Finance or HR. which serves as metadata for vector search indexes. This column allows the system to filter results during the retrieval process. By creating vector search indexes for each department and applying ACLs based on the "department" metadata, you ensure that Finance personnel access only financial data, while HR staff retrieve only HR-related information.
Create a Vector Search Endpoint
Begin by establishing a vector search endpoint, which serves as the interface for querying your vector indexes. If your admin has created an endpoint for you, go ahead and use it! We save all configs including vector_search_endpoint_name in one file and use .get() to retrieve parameters.
# Load Config Files
conf = yaml.safe_load(Path("./config/rag_chain_config.yaml").read_text())
databricks_resources = conf.get("databricks_resources")
Create a Delta Sync Index with ACLs
Next, create a Delta Sync Index for your dataset, specifying the source table and embedding configurations. Mosaic AI Vector Search provides a production-level API to create Vector Search indexes and keep it updated automatically. Ensure that the source table includes the "department" column to facilitate ACL-based filtering.
# create vector search index
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)
# index creation will take a couple of minutes, wait until the index is created
try:
index = vsc.create_delta_sync_index(
endpoint_name=databricks_resources.get("vector_search_endpoint_name"),
index_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('vector_search_index')}",
primary_key="chunk_id",
source_table_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('chunked_data_table')}",
pipeline_type="triggered",
embedding_source_column="chunked_text",
embedding_model_endpoint_name=databricks_resources.get(
"embedding_endpoint_name"
),
)
display(index.describe())
except Exception as e:
if "RESOURCE_ALREADY_EXISTS" in str(e):
print("Index already exists. Skipping index creation.")
else:
raise (e)
By following these steps, you create vector search indexes that leverage department metadata for ACL-based filtering, ensuring secure and efficient data retrieval tailored to each department's needs.
The next question would be how to implement a robust query processing pipeline with dynamic filtering, static configuration, and integration with vector search indexes.
The chain serves as the backbone of the RAG chatbot, orchestrating the flow of data from user queries to context retrieval and response generation. Our example is set up using LangChain but you are free to use other toolboxes like LlamaIndex, AutoGen, OpenAI with DSPy, etc.
Static Input and Dynamic Config Setup
Static inputs represent fixed parameters that remain constant throughout the query processing, such as number of retrieved content k. In contrast, dynamic configurations adapt to runtime inputs, such as user-specific parameters during execution, enabling context-aware ACL enforcement.
# combine dynamic and static filters for vector search
def create_configurable_with_filters(input: Dict, retriever_config: Dict) -> Dict:
"""
create configurable object with filters.
Args:
input: The input data containing filters.
Returns:
A configurable object with filters added to the search_kwargs.
"""
if "custom_inputs" in input:
filters = input["custom_inputs"]["filters"]
else:
filters = {}
configurable = {
"configurable": {
"search_kwargs": {
"k": retriever_config.get("parameters")["k"],
"query_type": retriever_config.get("parameters")["query_type"],
"filters": filters,
}
}
}
return configurable
Build the retrieval component
The vector search index is connected to create a retriever component. This component translates the vector search functionality into a form compatible with LangChain’s retriever APIs, enabling seamless integration with the chain.
############
# Connect to the Vector Search Index
############
vs_client = VectorSearchClient(disable_notice=True)
vs_index = vs_client.get_index(
endpoint_name=databricks_resources.get("vector_search_endpoint_name"),
index_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('vector_search_index')}",
)
vector_search_schema = retriever_config.get("schema")
############
# Turn the Vector Search index into a LangChain retriever
############
vector_search_as_retriever = DatabricksVectorSearch(
vs_index,
text_column=vector_search_schema.get("chunk_text"),
columns=[
vector_search_schema.get("primary_key"),
vector_search_schema.get("chunk_text"),
vector_search_schema.get("document_uri"),
],
).as_retriever(search_kwargs=retriever_config.get("parameters"))
configurable_vs_retriever = vector_search_as_retriever.configurable_fields(
search_kwargs=ConfigurableField(
id="search_kwargs",
name="Search Kwargs",
description="The search kwargs to use",
)
)
Chain Integration with Vector Search Retrieval
Lastly, we assemble everything together, integrating the retrieval component into the RAG chain, ensuring that user queries are processed with access filters and responses are formatted based on the retrieved context.
############
# Method to format the docs returned by the retriever into the prompt
############
def format_context(docs):
chunk_template = retriever_config.get("chunk_template")
chunk_contents = [
chunk_template.format(
chunk_text=d.page_content,
document_uri=d.metadata[vector_search_schema.get("document_uri")],
)
for d in docs
]
return "".join(chunk_contents)
############
# Prompt Template for generation
############
prompt = ChatPromptTemplate.from_messages(
[
( # System prompt contains the instructions
"system",
llm_config.get("llm_system_prompt_template"),
),
# User's question
("user", "{question}"),
]
)
############
# LLM for generation
############
model = ChatDatabricks(
endpoint=databricks_resources.get("llm_endpoint_name"),
extra_params=llm_config.get("llm_parameters"),
)
############
# RAG Chain
############
chain = (
{
"question": itemgetter("messages") | RunnableLambda(extract_user_query_string),
"context": RunnablePassthrough()
| RunnableLambda(
lambda input: configurable_vs_retriever.invoke(
extract_user_query_string(input["messages"]),
config=create_configurable_with_filters(input, retriever_config),
)
)
| RunnableLambda(format_context),
}
| prompt
| model
| StrOutputParser()
)
Testing the chain is a critical step before deploying the model. It ensures that access control filters are working as intended, preventing unauthorised access to sensitive data. By simulating queries with various roles and filters, you can verify that the system enforces ACLs correctly and retrieves data only for authorized users.
In our example, we test two cases:
Note that we use the MLflow Tracing feature here. MLflow Tracing enables developers to capture inputs, parameters, and outputs at each step of the AI workflow. By integrating seamlessly with Databricks tools, MLflow Tracing provides interactive visualizations and comprehensive trace data, aiding in performance optimization, latency analysis, and cost measurement through token usage tracking. See MLflow tracing for agents.
Invoke testing when employees from Finance department searching for HR information
# test with Finance filter
input_example = {
"messages": [
{
"role": "user",
"content": "Can you tell me about ABC company's HR department?", # Replace with a question relevant to your use case
}
],
"custom_inputs": {"filters": {"departments": "Finance"}},
}
chain.invoke(input_example)
Invoke testing when employees from HR department searching for HR information
# test with no filters
input_example = {
"messages": [
{
"role": "user",
"content": "Can you tell me about ABC company's HR department?", # Replace with a question relevant to your use case
}
],
}
chain.invoke(input_example)
Registering the RAG model with MLflow streamlines the management of its lifecycle. This includes logging the model, tracking its versions, and validating its functionality before deployment. With MLflow, you can ensure that any updates or improvements to the model are recorded systematically, making it easier to maintain and enhance over time. Additionally, MLflow enables seamless integration with resources like vector search indexes, serving endpoints, and SQL warehouses, ensuring efficient deployment and testing workflows.
Think of registering the RAG model as creating an official product catalog for your company’s star assistant. Just like documenting every new feature or update in an employee handbook, MLflow records every enhancement to the chatbot’s capabilities. For example, when updating the Finance department’s query handling logic or improving HR’s data access restrictions, MLflow ensures these changes are logged, versioned, and tested before being rolled out.
Infer Signature for Inputs and Outputs
The infer_signature function captures the structure of the input (CustomChatCompletionRequest) and output (StringResponse), ensuring consistent validation and compatibility.
# Define custom schema to incorporate ACL filters in the chatbot request
from dataclasses import dataclass, field, asdict
from mlflow.models.rag_signatures import (
ChatCompletionRequest,
ChatCompletionResponse,
StringResponse,
)
from typing import Optional, Dict
@dataclass
class CustomInputs():
filters: Dict[str, str] = field(default_factory=lambda: {"departments": "*"})
# Additional input fields must be marked as Optional and have a default value
@dataclass
class CustomChatCompletionRequest(ChatCompletionRequest):
custom_inputs: Optional[CustomInputs] = field(default_factory=CustomInputs)
signature = infer_signature(asdict(CustomChatCompletionRequest()), StringResponse())
Log and Register the Model
Log the model with associated configurations, resources, and requirements. By registering the chatbot model with MLflow, you create a central record for managing and evolving your RAG solution. This process ensures every update or feature is documented, tested, and deployed seamlessly, maintaining consistency and efficiency across departments like Finance and HR.
with mlflow.start_run():
model_info = mlflow.langchain.log_model(
# Pass the path to the saved model file
os.path.join(
os.getcwd(),
"02_single_turn_chatbot_with_acl",
),
"agent",
model_config="./config/rag_chain_config.yaml",
input_example=input_example,
signature=signature,
pip_requirements=[
"mlflow",
"langchain_core",
"databricks-langchain",
"langchain-community",
"databricks-vectorsearch",
],
resources=resources,
registered_model_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('model_name')}",
)
Deploying the RAG model with the Mosaic AI agent framework is not only simple but also powerful. Along with deploying the model to a serving endpoint, the process automatically generates a review app. This review app provides an easy to use platform where SMEs can test the chatbot’s functionality while adhering to ACL rules. All user requests and responses are logged automatically, enabling downstream analysis, such as sentiment or topic analysis, to refine and optimize the chatbot’s performance.
In our example, imagine deploying a secure RAG chatbot for your company, enabling Finance and HR teams to test it with confidence. Without metadata filters, users could access HR information regardless of their department. However, by enabling input filters to reflect user roles, such as a Finance user querying the system, the chatbot ensures only relevant financial information is shared while HR data remains inaccessible. Similarly, HR users can retrieve HR-specific data in a controlled environment. The automatically generated review app empowers both teams to validate department-specific access easily.
from databricks import agents
# Deploy the model to the review app and a model serving endpoint
agents.deploy(
f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('model_name')}",
model_info.registered_model_version, endpoint_name=databricks_resources.get("chatbot_endpoint_name")
)
With the Databricks API, deploying the RAG model is quick and efficient. The auto-generated review app not only facilitates secure testing for Finance and HR but also provides rich logging for downstream analysis.
What does that mean? All interactions within the review app are logged into an inference table, providing a rich dataset for downstream analysis. These logs are not just for record-keeping—they unlock opportunities to analyze trends, improve the chatbot’s performance, and derive actionable insights. Here’s how you can make the most of this feature:
By leveraging these tools, you can turn logged interactions into actionable insights, continuously improving your chatbot while maintaining security and scalability.
In this blog, we explored how to implement Retrieval-Augmented Generation (RAG) with Access Control Lists (ACLs) using Databricks. From defining user roles and tagging data with metadata to building secure chains and deploying the model with ease, Databricks provides a robust and scalable framework for managing data access in applications. By combining the power of vector search with metadata filtering, we ensure that only authorized users can retrieve data relevant to their roles, enhancing both security and usability.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.