cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using Genie Conversational API with External Users and Data-Level Security

JohnnyA
New Contributor

We are planning to implement a chat interface in our portal application using the Genie Conversational API, where clients, partners, and internal users can ask questions in natural language and receive answers based on our data.

I have the following questions:

1. Authentication and Authorization for External Users

We don't want to create Databricks accounts for our clients and partners. Is there a way to pass a user identifier through the Conversational API that would allow us to programmatically enforce access controls? Specifically, we need to verify whether external users have permission to access specific tables and data without them having direct Databricks credentials.

2. Row-Level Security / Data Filtering

Our clients and partners have different data access levels (row-level permissions). Is there a mechanism within Genie to apply data filters based on the authenticated user before processing queries? For example:

  • Partner A should only see records related to their organization
  • Client B should only access their specific subset of data

How can we ensure Genie respects these data-level permissions when generating responses?

3. Limiting Genie's Response Scope

Currently, Genie answers generic questions outside our business domain, even with system-level instructions configured. For example, it will respond to questions like "What is the weather in Chicago?"

Is there a way to restrict Genie to only answer questions related to our specific data and business context, and politely decline or redirect out-of-scope queries?

We tried system-level instruction in the genie space, but it didn't work out. 



 

2 REPLIES 2

Isi
Honored Contributor III

Hello @JohnnyA 

I'll try to explain ideas and hope something works for you because I don't have the whole context.

1) Authentication & authorization for external users

Recommended (best practice):

Federated identity + OBO. Your portal authenticates with your IdP (Entra/Okta, etc.), exchanges the IdP token for a Databricks OAuth token, and your backend calls the Genie Conversation API or SQL on behalf of the user. Result: per-user permissions, fine-grained audit, and least privilegeโ€”without creating manual accounts or issuing PATs to clients.

Alternative:

Run with a Service Principal (least privilege) and isolate each tenant with views/policies (or  e.g one SP per partner). This is simpler operationally but loses per-user traceability and scales worse.

2) Row-level security / per-user filtering

Enforce security in the data layer, not in prompts:

  • Row filters (row-level filtering) and column masks (column-level masking) in Unity Catalog. Policies evaluate the current user at read time.

  • ABAC via governed tags: tag columns/objects (tenant, sensitivity, role) and define policies by attributesโ€”this scales better than one-off rules.

  • Dynamic views for logic spanning multiple tables (handy for partners/clients with complex rules).

Genie will generate SQL against these tables and Unity Catalog will enforce the policies automatically.

3) Restrict Genie to your business domain

Observed behavior (real test):

In a curated Space with a small set of tables (e.g., a single sales table), asking an off-topic question like โ€œWhatโ€™s the weather in Madrid?โ€ yielded a refusal along the lines of:

โ€œYour question is irrelevant to the provided database, as it does not contain information about the weather or temperatures in Madrid. Please ask questions related to the data available in the customers_orders table.โ€

In practice, when the Space is tight (few tables, strong instructions, example queries), I havenโ€™t been able to force Genie to leave the Spaceโ€™s domain.

How to make this reliable in production:

  1. Curate the Space: keep very few tables/views, add clear instructions (โ€œonly answer using the provided datasetsโ€), and include example queries. Always call the API with the correct space_id.

  2. Portal โ€œfirewallโ€: before invoking Genie, run a simple in-scope check. If a question doesnโ€™t map to your domain (no match to metrics/tables/terms), donโ€™t call Genie. Return a friendly message:

    โ€œI can only answer questions about <your datasets>. Try asking about <examples>.โ€

    • Example: user asks โ€œGive me sales for 20189.โ€ Genie might not know if โ€œ20189โ€ is a typo for a year (2018/2019) or a product ID.

    • Your pre-rewrite can use business rules (e.g., sales exist only 2019โ€“2025; product IDs follow aaaaa-bbbb-cc) to produce a cleaner prompt or to route to the right Space.

    • This improves answer quality when users lack data/Genie context, at the cost of a small extra step in your backend.

      (Optional) Pre-rewrite for clarity (LLM pass in your portal):

      Add a lightweight LLM step that reformulates the userโ€™s question without changing intent, just to resolve ambiguity and align to your schema/terms.

       

Hope this helps, ๐Ÿ™‚

Isi

 

JohnnyA
New Contributor

Really appreciate your time and support!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now