Databricks Community

simmitil · Friday

In standard Genie Space chat, space managers can use the Monitoring tab to review prompts/conversations. However, in Agent Mode, we see the warning:

“Genie agent responses may contain results obtained using other users’ credentials and are hidden from space managers.”

A few questions:

Why is Agent Mode treated differently from standard Genie chat if both are ultimately querying the same underlying tables/data assets?
Is there currently any admin/workspace setting, governance control, or future roadmap item that would allow space managers to have fuller visibility into Agent Mode conversations/results for testing and governance purposes?
Do benchmarks, thumbs up/down feedback, and saved benchmark questions improve/tune both standard mode and Agent Mode behavior equally, or are they handled separately?

Would appreciate any clarification from anyone who has implemented governance/testing processes around Genie Agent Mode. Thank you!!

Lu_Wang_ENB_DBX · Friday

Answers to your questions:

Why Agent Mode is treated differently
Because Agent Mode can generate synthesized text/report answers from multi-step reasoning, and internal product guidance says those answers may contain data outside the reviewing manager’s own RLS/CLS scope, so managers may see the prompt but not open the answer by default.
By contrast, standard/chat-mode governance is more aligned with manager review and rerun workflows using the manager’s own credentials.
Current admin/governance control / roadmap
The main control is Genie conversation/chat sharing (Beta / workspace preview setting). When enabled, new conversations default to “Reviewable by space managers”; when not enabled, conversations are Private.
Enabling Genie Chat Sharing is the current way to let space managers inspect Agent Mode conversations/results; it applies to conversations created after the setting is turned on, unless the user makes the conversation Private.
The documented controls are sharing and request review.
Benchmarks / thumbs up-down / saved benchmark questions: same or separate?
Benchmarks support both Chat and Agent Mode, but they are handled separately: Chat mode benchmarks compare against gold SQL, while Agent mode benchmarks use the same multi-step reasoning as Agent Mode and are graded by an LLM judge.
Also, benchmarks are for evaluation, not tuning: the docs explicitly say benchmark questions and example SQL in benchmarks do not improve Genie’s context.
Thumbs up/down and review feedback help space managers refine the space (instructions, examples, suggested SQL snippets), which can improve both modes indirectly, but that is space curation, not automatic model tuning.
Saved representative answers/messages can be turned into benchmark questions, which helps testing coverage, but again that is evaluation asset creation, not direct tuning.

Practical governance pattern to use

Enable Genie Chat Sharing, keep new conversations Reviewable by space managers, use Request review for edge cases, and run separate Chat-mode and Agent-mode benchmark suites for validation.