Hello @JohnnyA
I'll try to explain ideas and hope something works for you because I don't have the whole context.
1) Authentication & authorization for external users
Recommended (best practice):
Federated identity + OBO. Your portal authenticates with your IdP (Entra/Okta, etc.), exchanges the IdP token for a Databricks OAuth token, and your backend calls the Genie Conversation API or SQL on behalf of the user. Result: per-user permissions, fine-grained audit, and least privilegeโwithout creating manual accounts or issuing PATs to clients.
Alternative:
Run with a Service Principal (least privilege) and isolate each tenant with views/policies (or e.g one SP per partner). This is simpler operationally but loses per-user traceability and scales worse.
2) Row-level security / per-user filtering
Enforce security in the data layer, not in prompts:
Row filters (row-level filtering) and column masks (column-level masking) in Unity Catalog. Policies evaluate the current user at read time.
ABAC via governed tags: tag columns/objects (tenant, sensitivity, role) and define policies by attributesโthis scales better than one-off rules.
Dynamic views for logic spanning multiple tables (handy for partners/clients with complex rules).
Genie will generate SQL against these tables and Unity Catalog will enforce the policies automatically.
3) Restrict Genie to your business domain
Observed behavior (real test):
In a curated Space with a small set of tables (e.g., a single sales table), asking an off-topic question like โWhatโs the weather in Madrid?โ yielded a refusal along the lines of:
โYour question is irrelevant to the provided database, as it does not contain information about the weather or temperatures in Madrid. Please ask questions related to the data available in the customers_orders table.โ
In practice, when the Space is tight (few tables, strong instructions, example queries), I havenโt been able to force Genie to leave the Spaceโs domain.
How to make this reliable in production:
Curate the Space: keep very few tables/views, add clear instructions (โonly answer using the provided datasetsโ), and include example queries. Always call the API with the correct space_id.
Portal โfirewallโ: before invoking Genie, run a simple in-scope check. If a question doesnโt map to your domain (no match to metrics/tables/terms), donโt call Genie. Return a friendly message:
โI can only answer questions about <your datasets>. Try asking about <examples>.โ
Example: user asks โGive me sales for 20189.โ Genie might not know if โ20189โ is a typo for a year (2018/2019) or a product ID.
Your pre-rewrite can use business rules (e.g., sales exist only 2019โ2025; product IDs follow aaaaa-bbbb-cc) to produce a cleaner prompt or to route to the right Space.
This improves answer quality when users lack data/Genie context, at the cost of a small extra step in your backend.
(Optional) Pre-rewrite for clarity (LLM pass in your portal):
Add a lightweight LLM step that reformulates the userโs question without changing intent, just to resolve ambiguity and align to your schema/terms.
Hope this helps, ๐
Isi