cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

RBAC and VectorSearch

re
New Contributor II

When implementing the managed VectorSearch, what is the preferred way to implement row based access control? I see that you can use the filter API during a query, so simple filters using a certain column may work, but what if all the security information is in another table?

The use case in question is for a RAG workflow, but where some information should be limited based on the querying user. The filter API probably work fine for a simple "information_deprecated" flag, but probably not for checking group membership.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @re

  • As you mentioned, the filter API allows you to apply simple filters during a query. This approach works well for scenarios where you want to restrict access based on specific column values (e.g., an “information_deprecated” flag).
  • For instance, you can filter out documents where the “information_deprecated” flag is set to true.
  • However, this approach might not be suitable for more complex access control requirements, such as checking group membership.
  • If your security information (e.g., group membership) resides in a separate table, you’ll need a more sophisticated approach.
  • Consider creating a mapping between users and their associated groups. This mapping could be stored in a separate table or a graph structure.
  • When querying VectorSearch, use this mapping to determine which groups the querying user belongs to.
  • Then, apply appropriate filters based on group membership. For example:
    • If a user belongs to Group A, allow access to documents associated with Group A.
    • If a user belongs to Group B, restrict access to documents associated with Group B.
  • RBAC is a powerful mechanism for managing access control. It allows you to define roles and assign permissions to those roles.
  • Create roles that correspond to different levels of access (e.g., read-only, read-write, admin).
  • Assign users to specific roles based on their group memberships or other criteria.
  • During VectorSearch queries, apply filters based on the user’s role. For example:
    • If a user has the “read-only” role, limit access to read-only documents.
    • If a user has the “admin” role, allow access to all documents.
  • Depending on your use case, consider fine-grained access control at the document level.
  • Attach metadata to each document indicating which groups or roles are allowed to access it.
  • During queries, use this metadata to filter out documents that the querying user is not authorized to see.

re
New Contributor II

Thanks AI for summarizing my question. However, you did not actually answer it.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.