Databricks Community

s-udhaya · ‎01-14-2025

Imagine you’re running a company with multiple departments, like Finance, Legal, and HR. Each department has its own sensitive data—financial reports, legal contracts, and employee records—that need to stay private. Now, picture a star employee, a RAG chatbot, who can instantly retrieve the exact information you need, thanks to a smart system called Vector Search. But here’s the catch: not everyone should access everything. You wouldn’t want someone in Finance snooping through Legal’s contracts, right? That’s where Access Control Lists (ACL) come in. They act like security badges, ensuring only authorized team members can access their department’s data, keeping everything safe and sound.

In this blog post, we'll show how to implement ACL in your RAG chatbot using Mosaic AI Vector Search. Whether you're a seasoned developer or just starting your RAG journey, this guide will equip you with the knowledge and tools to secure your chatbot and protect sensitive data.

All the code samples referenced in this blog are available in the following github repo.

Understanding ACL concepts

What is a RAG chatbot and why is it useful?

Before diving into ACL, let's quickly revisit the concept of RAG. If you're already familiar with RAG, feel free to skip ahead to the ACL section.

A RAG chatbot combines the power of large language models (LLMs) with your own data to provide accurate and contextually relevant responses. Instead of relying solely on the LLM's pre-trained knowledge, a RAG chatbot can access and process information from your documents, databases, or any other data source.

RAG Architecture.png

This makes it ideal for applications like:

Customer support: Providing personalised answers to customer queries based on internal knowledge bases.
Content creation: Generating reports, articles, or summaries from large datasets.
Research and analysis: Extracting insights and trends from unstructured data.

Mosaic AI Vector Search offers powerful capabilities that make it easy to find relevant information within your data. Think of it like an online store recommending similar products based on your browsing history. Databricks allows you to "embed" your data into a vector space, making it easy to find similar items based on their meaning and context.

Mosaic AI Vector Search.png

What is ACL and why is ACL important for RAG chatbots?

In Databricks’ definition, an Access Control List (ACL) is a set of permissions attached to objects within a system, specifying which users or system processes are granted access to those objects and what operations are allowed. ACLs are used to configure permissions to ensure that only authorized users can access specific data.

Let’s bring this into a company and department setting. Imagine your organization has departments like Finance and Legal, each handling sensitive and distinct data. Without ACLs, anyone might access critical financial reports or confidential employee documents, which could lead to breaches or errors. ACLs act as the gatekeeper, ensuring only Finance team members can access financial data and only HR personnel can access employee files. This not only protects sensitive information but also maintains clarity and efficiency by keeping each department focused on its own resources.

Screenshot 2024-12-13 at 8.29.15 AM.png

How to do RAG ACLs in Databricks’ world?

Access Control Lists (ACLs) are in our genes, reflecting the strong foundation of data governance in Databricks. Implementing RAG with ACLs offers a powerful way to manage data access while enhancing the capabilities of applications such as Q&A bots or recommendation systems. By combining the flexibility of vector search with robust metadata-based access control, you can ensure secure, role-specific data retrieval. This approach integrates seamlessly into existing workflows and enables fine-grained control over how data is accessed and used across different applications.

To achieve this, data is stored in a Delta table enriched with metadata columns, such as source or accessLevel, which define access rules. This Delta table is synced with the Databricks Vector Search engine, allowing queries to apply filters based on metadata. For instance, a "Public Q&A bot" may filter results to include only source="Public docs", while an "Internal Q&A bot" applies stricter criteria, such as accessLevel <= 2. These filters, passed through the Databricks Vector Search API or Python SDK, ensure users retrieve only the data they are authorized to access.

Screenshot 2024-12-13 at 8.30.56 AM.png

However, what if the parameter is dynamic rather than static? In such cases, Databricks provides powerful capabilities to handle runtime parameters, enabling filters to be tailored dynamically based on the user’s query or role. For instance, a user's department or access level can be passed as a dynamic filter to the vector search query at runtime, ensuring that access controls adapt seamlessly to the specific context of each request. This approach allows for real-time customization of access rules, making it ideal for scenarios where user permissions vary or depend on runtime inputs.

In fact, this dynamic filtering capability is a key highlight of this blog, as it forms the foundation of the major demonstration we will showcase—illustrating how combining static and dynamic parameters can enforce ACLs effectively in a RAG workflow.

By combining metadata filtering with vector search, Databricks provides a scalable, secure framework for managing data access in applications that need role-specific insights, without compromising performance or usability.

Implementing ACL with Mosaic AI Vector Search

Let’s dive into the implementation! We’ll use the Databricks platform and its products, combined with LangChain, to build a RAG chatbot with robust ACL security. Don’t worry—we’ll break everything down into simple, step-by-step instructions with clear and relatable explanations.

Each step is accompanied by key code snippets to guide you through the process. For a deeper dive, explore our Github repo featuring the end-to-end implementation or refer to Databricks' documentation.

1. Define your permissions

The first step in implementing ACLs is identifying the users in your system and the roles they belong to. Roles help group users based on their access needs, making it easier to manage permissions.

In our example, users are grouped into two departments: Finance and HR. A Finance Analyst from the Finance department might analyze budgets, while an HR Specialist from the HR department handles employee records, ensuring clear separation of responsibilities.

2. Tag your data with security labels

Tagging data with security labels allows you to classify it based on sensitivity or ownership. These labels act as identifiers, enabling ACLs to enforce access restrictions efficiently.

To simplify this blog post, we synthesize two PDFs. One for the Finance department, the other for the HR department. We store chunked text in a CSV file. We'll load this CSV file into a Delta table and use it as the source for our vector search index. In real-world scenarios, you might encounter diverse data sources and formats like PDFs, PPTs, and Confluence pages. This often requires writing custom data parsing and chunking techniques to process the data into a usable format, similar to the sample data in this example. For guidance on handling different file formats, refer to the Databricks AI cookbook.

For ACLs, we create a column department as metadata to be filtered. The financial department introduction is tagged as "Finance", the HR department introduction as “HR”. When a user queries data, metadata filtering ensures only the relevant department’s data is accessible.

Screenshot 2024-12-18 at 10.15.39 PM.png

3. Create Vector Search Indexes with Access Control Lists

The next step is to use those chunks to create vector search indexes. Vector search indexes enable fast, semantic retrieval of data by mapping information into a searchable space. When ACLs are added to these indexes, they enforce strict access controls, ensuring users can only retrieve data that aligns with their roles and associated metadata.

In our case, the original data includes the department column that specifies which department should have the access to a record, either Finance or HR. which serves as metadata for vector search indexes. This column allows the system to filter results during the retrieval process. By creating vector search indexes for each department and applying ACLs based on the "department" metadata, you ensure that Finance personnel access only financial data, while HR staff retrieve only HR-related information.

Create a Vector Search Endpoint

Begin by establishing a vector search endpoint, which serves as the interface for querying your vector indexes. If your admin has created an endpoint for you, go ahead and use it! We save all configs including vector_search_endpoint_name in one file and use .get() to retrieve parameters.

# Load Config Files
conf = yaml.safe_load(Path("./config/rag_chain_config.yaml").read_text())
databricks_resources = conf.get("databricks_resources")

Create a Delta Sync Index with ACLs

Next, create a Delta Sync Index for your dataset, specifying the source table and embedding configurations. Mosaic AI Vector Search provides a production-level API to create Vector Search indexes and keep it updated automatically. Ensure that the source table includes the "department" column to facilitate ACL-based filtering.

# create vector search index
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)
# index creation will take a couple of minutes, wait until the index is created
try:
   index = vsc.create_delta_sync_index(
       endpoint_name=databricks_resources.get("vector_search_endpoint_name"),
       index_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('vector_search_index')}",
       primary_key="chunk_id",
       source_table_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('chunked_data_table')}",
       pipeline_type="triggered",
       embedding_source_column="chunked_text",
       embedding_model_endpoint_name=databricks_resources.get(
           "embedding_endpoint_name"
       ),
   )

   display(index.describe())

except Exception as e:
   if "RESOURCE_ALREADY_EXISTS" in str(e):
       print("Index already exists. Skipping index creation.")
   else:
       raise (e)

By following these steps, you create vector search indexes that leverage department metadata for ACL-based filtering, ensuring secure and efficient data retrieval tailored to each department's needs.

4. Define a Chain with filters on Access

The next question would be how to implement a robust query processing pipeline with dynamic filtering, static configuration, and integration with vector search indexes.

The chain serves as the backbone of the RAG chatbot, orchestrating the flow of data from user queries to context retrieval and response generation. Our example is set up using LangChain but you are free to use other toolboxes like LlamaIndex, AutoGen, OpenAI with DSPy, etc.

Static Input and Dynamic Config Setup

Static inputs represent fixed parameters that remain constant throughout the query processing, such as number of retrieved content k. In contrast, dynamic configurations adapt to runtime inputs, such as user-specific parameters during execution, enabling context-aware ACL enforcement.

Static Configurations: Include fixed parameters like the number of results to retrieve (k) and the query type (ANN for approximate nearest neighbour searches).
Dynamic Configurations: Adjustments based on runtime inputs, such as filtering by user roles or departments through metadata.

# combine dynamic and static filters for vector search
def create_configurable_with_filters(input: Dict, retriever_config: Dict) -> Dict:
   """
   create configurable object with filters.
   Args:
       input: The input data containing filters.
   Returns:
       A configurable object with filters added to the search_kwargs.
   """
   if "custom_inputs" in input:
       filters = input["custom_inputs"]["filters"]
   else:
       filters = {}
   configurable = {
       "configurable": {
           "search_kwargs": {
               "k": retriever_config.get("parameters")["k"],
               "query_type": retriever_config.get("parameters")["query_type"],
               "filters": filters,
           }
       }
   }
   return configurable

Build the retrieval component

The vector search index is connected to create a retriever component. This component translates the vector search functionality into a form compatible with LangChain’s retriever APIs, enabling seamless integration with the chain.

Connect to the Vector Search Index: The index is initialized using Databricks resources, including the catalog, schema, and index name.
Create the LangChain Retriever: The vector search index is wrapped into a retriever that includes text and metadata fields like primary keys and document URIs.

############
# Connect to the Vector Search Index
############
vs_client = VectorSearchClient(disable_notice=True)
vs_index = vs_client.get_index(
   endpoint_name=databricks_resources.get("vector_search_endpoint_name"),
   index_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('vector_search_index')}",
)
vector_search_schema = retriever_config.get("schema")
############
# Turn the Vector Search index into a LangChain retriever
############
vector_search_as_retriever = DatabricksVectorSearch(
   vs_index,
   text_column=vector_search_schema.get("chunk_text"),
   columns=[
       vector_search_schema.get("primary_key"),
       vector_search_schema.get("chunk_text"),
       vector_search_schema.get("document_uri"),
   ],
).as_retriever(search_kwargs=retriever_config.get("parameters"))

configurable_vs_retriever = vector_search_as_retriever.configurable_fields(
   search_kwargs=ConfigurableField(
       id="search_kwargs",
       name="Search Kwargs",
       description="The search kwargs to use",
   )
)

Chain Integration with Vector Search Retrieval

Lastly, we assemble everything together, integrating the retrieval component into the RAG chain, ensuring that user queries are processed with access filters and responses are formatted based on the retrieved context.

Retrieve Context with Filters: The retriever dynamically applies filters (e.g., department-level restrictions) during context retrieval.
Format Retrieved Data: The retrieved context is formatted into a structured prompt template for the LLM to process.
Generate Response: The RAG chain passes the formatted input to the language model (LLM) for response generation.

############
# Method to format the docs returned by the retriever into the prompt
############
def format_context(docs):
   chunk_template = retriever_config.get("chunk_template")
   chunk_contents = [
       chunk_template.format(
           chunk_text=d.page_content,
           document_uri=d.metadata[vector_search_schema.get("document_uri")],
       )
       for d in docs
   ]
   return "".join(chunk_contents)

############
# Prompt Template for generation
############
prompt = ChatPromptTemplate.from_messages(
   [
       (  # System prompt contains the instructions
           "system",
           llm_config.get("llm_system_prompt_template"),
       ),
       # User's question
       ("user", "{question}"),
   ]
)

############
# LLM for generation
############
model = ChatDatabricks(
   endpoint=databricks_resources.get("llm_endpoint_name"),
   extra_params=llm_config.get("llm_parameters"),
)

############
# RAG Chain
############
chain = (
   {
       "question": itemgetter("messages") | RunnableLambda(extract_user_query_string),
       "context": RunnablePassthrough()
       | RunnableLambda(
           lambda input: configurable_vs_retriever.invoke(
               extract_user_query_string(input["messages"]),
               config=create_configurable_with_filters(input, retriever_config),
           )
       )
       | RunnableLambda(format_context),
   }
   | prompt
   | model
   | StrOutputParser()
)

5. Invoke testing the Chain

Testing the chain is a critical step before deploying the model. It ensures that access control filters are working as intended, preventing unauthorised access to sensitive data. By simulating queries with various roles and filters, you can verify that the system enforces ACLs correctly and retrieves data only for authorized users.

In our example, we test two cases:

Finance User Accessing HR Data: Simulate a query from a Finance user asking about the HR department while applying a filter restricting access to the "Finance" department. The chain should block access to HR data, ensuring that Finance users cannot retrieve restricted information.
HR User Accessing HR Data: Simulate a query without any restrictive filters, representing an HR user querying HR data. The chain should retrieve relevant HR information, demonstrating proper access for authorized users.

Note that we use the MLflow Tracing feature here. MLflow Tracing enables developers to capture inputs, parameters, and outputs at each step of the AI workflow. By integrating seamlessly with Databricks tools, MLflow Tracing provides interactive visualizations and comprehensive trace data, aiding in performance optimization, latency analysis, and cost measurement through token usage tracking. See MLflow tracing for agents.

Invoke testing when employees from Finance department searching for HR information

# test with Finance filter
input_example = {
   "messages": [
       {
           "role": "user",
           "content": "Can you tell me about ABC company's HR department?",  # Replace with a question relevant to your use case
       }
   ],
   "custom_inputs": {"filters": {"departments": "Finance"}},
}
chain.invoke(input_example)

Screenshot 2024-12-18 at 10.14.21 PM.png

Invoke testing when employees from HR department searching for HR information

# test with no filters
input_example = {
   "messages": [
       {
           "role": "user",
           "content": "Can you tell me about ABC company's HR department?", # Replace with a question relevant to your use case
       }
   ],
}
chain.invoke(input_example)

Screenshot 2024-12-18 at 10.13.53 PM.png

6. Register the RAG model with MLflow

Registering the RAG model with MLflow streamlines the management of its lifecycle. This includes logging the model, tracking its versions, and validating its functionality before deployment. With MLflow, you can ensure that any updates or improvements to the model are recorded systematically, making it easier to maintain and enhance over time. Additionally, MLflow enables seamless integration with resources like vector search indexes, serving endpoints, and SQL warehouses, ensuring efficient deployment and testing workflows.

Think of registering the RAG model as creating an official product catalog for your company’s star assistant. Just like documenting every new feature or update in an employee handbook, MLflow records every enhancement to the chatbot’s capabilities. For example, when updating the Finance department’s query handling logic or improving HR’s data access restrictions, MLflow ensures these changes are logged, versioned, and tested before being rolled out.

Infer Signature for Inputs and Outputs

The infer_signature function captures the structure of the input (CustomChatCompletionRequest) and output (StringResponse), ensuring consistent validation and compatibility.

# Define custom schema to incorporate ACL filters in the chatbot request
from dataclasses import dataclass, field, asdict
from mlflow.models.rag_signatures import (
   ChatCompletionRequest,
   ChatCompletionResponse,
   StringResponse,
)
from typing import Optional, Dict

@dataclass
class CustomInputs():
   filters: Dict[str, str] = field(default_factory=lambda: {"departments": "*"})

# Additional input fields must be marked as Optional and have a default value
@dataclass
class CustomChatCompletionRequest(ChatCompletionRequest):
   custom_inputs: Optional[CustomInputs] = field(default_factory=CustomInputs)

signature = infer_signature(asdict(CustomChatCompletionRequest()), StringResponse())

Log and Register the Model

Log the model with associated configurations, resources, and requirements. By registering the chatbot model with MLflow, you create a central record for managing and evolving your RAG solution. This process ensures every update or feature is documented, tested, and deployed seamlessly, maintaining consistency and efficiency across departments like Finance and HR.

with mlflow.start_run():
   model_info = mlflow.langchain.log_model(
       # Pass the path to the saved model file
       os.path.join(
           os.getcwd(),
           "02_single_turn_chatbot_with_acl",
       ),
       "agent",
       model_config="./config/rag_chain_config.yaml",
       input_example=input_example,
       signature=signature,
       pip_requirements=[
           "mlflow",
           "langchain_core",
           "databricks-langchain",
           "langchain-community",
           "databricks-vectorsearch",
       ],
       resources=resources,
       registered_model_name=f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('model_name')}", 
   )

7. Deploy the RAG model with a Review App

Deploying the RAG model with the Mosaic AI agent framework is not only simple but also powerful. Along with deploying the model to a serving endpoint, the process automatically generates a review app. This review app provides an easy to use platform where SMEs can test the chatbot’s functionality while adhering to ACL rules. All user requests and responses are logged automatically, enabling downstream analysis, such as sentiment or topic analysis, to refine and optimize the chatbot’s performance.

Ease of Use: The agents.deploy function abstracts complex deployment steps, making it seamless to set up the chatbot.
Review App Generation: The deployment automatically creates a review app to simulate real-world usage.

In our example, imagine deploying a secure RAG chatbot for your company, enabling Finance and HR teams to test it with confidence. Without metadata filters, users could access HR information regardless of their department. However, by enabling input filters to reflect user roles, such as a Finance user querying the system, the chatbot ensures only relevant financial information is shared while HR data remains inaccessible. Similarly, HR users can retrieve HR-specific data in a controlled environment. The automatically generated review app empowers both teams to validate department-specific access easily.

from databricks import agents
# Deploy the model to the review app and a model serving endpoint
agents.deploy(
   f"{databricks_resources.get('catalog')}.{databricks_resources.get('schema')}.{databricks_resources.get('model_name')}",
   model_info.registered_model_version, endpoint_name=databricks_resources.get("chatbot_endpoint_name")
)

review app_final.gif

8. Bonus Points: Unlock Insights with Request and Response Logging

With the Databricks API, deploying the RAG model is quick and efficient. The auto-generated review app not only facilitates secure testing for Finance and HR but also provides rich logging for downstream analysis.

What does that mean? All interactions within the review app are logged into an inference table, providing a rich dataset for downstream analysis. These logs are not just for record-keeping—they unlock opportunities to analyze trends, improve the chatbot’s performance, and derive actionable insights. Here’s how you can make the most of this feature:

Batch Analysis with AI Functions: Analyze logs to identify sentiment trends and categorize frequently asked topics, helping refine the chatbot's responses and uncover user needs.
Secure and Scale with Mosaic AI Gateway: Protect sensitive data with PII masking and handle high traffic with rate limiting, ensuring compliance and reliable performance.

By leveraging these tools, you can turn logged interactions into actionable insights, continuously improving your chatbot while maintaining security and scalability.

Conclusion

In this blog, we explored how to implement Retrieval-Augmented Generation (RAG) with Access Control Lists (ACLs) using Databricks. From defining user roles and tagging data with metadata to building secure chains and deploying the model with ease, Databricks provides a robust and scalable framework for managing data access in applications. By combining the power of vector search with metadata filtering, we ensure that only authorized users can retrieve data relevant to their roles, enhancing both security and usability.