Databricks Community

JohnnyA · ‎10-09-2025

We are planning to implement a chat interface in our portal application using the Genie Conversational API, where clients, partners, and internal users can ask questions in natural language and receive answers based on our data.

I have the following questions:

1. Authentication and Authorization for External Users

We don't want to create Databricks accounts for our clients and partners. Is there a way to pass a user identifier through the Conversational API that would allow us to programmatically enforce access controls? Specifically, we need to verify whether external users have permission to access specific tables and data without them having direct Databricks credentials.

2. Row-Level Security / Data Filtering

Our clients and partners have different data access levels (row-level permissions). Is there a mechanism within Genie to apply data filters based on the authenticated user before processing queries? For example:

Partner A should only see records related to their organization
Client B should only access their specific subset of data

How can we ensure Genie respects these data-level permissions when generating responses?

3. Limiting Genie's Response Scope

Currently, Genie answers generic questions outside our business domain, even with system-level instructions configured. For example, it will respond to questions like "What is the weather in Chicago?"

Is there a way to restrict Genie to only answer questions related to our specific data and business context, and politely decline or redirect out-of-scope queries?

We tried system-level instruction in the genie space, but it didn't work out.

Isi · ‎10-11-2025

Hello @JohnnyA

I'll try to explain ideas and hope something works for you because I don't have the whole context.

1) Authentication & authorization for external users

Recommended (best practice):

Federated identity + OBO. Your portal authenticates with your IdP (Entra/Okta, etc.), exchanges the IdP token for a Databricks OAuth token, and your backend calls the Genie Conversation API or SQL on behalf of the user. Result: per-user permissions, fine-grained audit, and least privilege—without creating manual accounts or issuing PATs to clients.

Alternative:

Run with a Service Principal (least privilege) and isolate each tenant with views/policies (or e.g one SP per partner). This is simpler operationally but loses per-user traceability and scales worse.

2) Row-level security / per-user filtering

Enforce security in the data layer, not in prompts:

Row filters (row-level filtering) and column masks (column-level masking) in Unity Catalog. Policies evaluate the current user at read time.
ABAC via governed tags: tag columns/objects (tenant, sensitivity, role) and define policies by attributes—this scales better than one-off rules.
Dynamic views for logic spanning multiple tables (handy for partners/clients with complex rules).

Genie will generate SQL against these tables and Unity Catalog will enforce the policies automatically.

3) Restrict Genie to your business domain

Observed behavior (real test):

In a curated Space with a small set of tables (e.g., a single sales table), asking an off-topic question like “What’s the weather in Madrid?” yielded a refusal along the lines of:

“Your question is irrelevant to the provided database, as it does not contain information about the weather or temperatures in Madrid. Please ask questions related to the data available in the customers_orders table.”

In practice, when the Space is tight (few tables, strong instructions, example queries), I haven’t been able to force Genie to leave the Space’s domain.

How to make this reliable in production:

Curate the Space: keep very few tables/views, add clear instructions (“only answer using the provided datasets”), and include example queries. Always call the API with the correct space_id.
Portal “firewall”: before invoking Genie, run a simple in-scope check. If a question doesn’t map to your domain (no match to metrics/tables/terms), don’t call Genie. Return a friendly message:
“I can only answer questions about <your datasets>. Try asking about <examples>.”
- Example: user asks “Give me sales for 20189.” Genie might not know if “20189” is a typo for a year (2018/2019) or a product ID.
- Your pre-rewrite can use business rules (e.g., sales exist only 2019–2025; product IDs follow aaaaa-bbbb-cc) to produce a cleaner prompt or to route to the right Space.
- This improves answer quality when users lack data/Genie context, at the cost of a small extra step in your backend.
  (Optional) Pre-rewrite for clarity (LLM pass in your portal):
  Add a lightweight LLM step that reformulates the user’s question without changing intent, just to resolve ambiguity and align to your schema/terms.

Hope this helps, 🙂

Isi

View solution in original post

Isi · ‎10-11-2025