โ04-01-2026 06:15 AM
I am building an agentic workflow. This is a multi agent workflow - Plan, Reason, Act and Synthesize. Each agent has its own access to tools to take ACTIONS on data. Some of these agents are READ only, some can WRITE/UPDATE
All data is resident within Unity Catalog when the initial access can be granted.
When the workflow executes and agents come up with dynamic plans I want to control the type of ACTIONS take on the data based on policy contraints. These are runtime Actions I want to control and monitor. Some agents have read only access, some can write (to specific datasets), some can move data (under some conditions).
Is there a best practices approach to control agentic worflows at runtime?
2 weeks ago
Hi @venkat-raghavan,
Thanks. Good point, and I agree the distinction should be clearer. What I was trying to separate are two related but different ideas... Runtime tool-level control and End-to-end workflow control for destructive changes
The public documentation on service policies absolutely does imply...and explicitly says... that service policies are there to narrow the action surface at runtime. The blog says the core problem in recent incidents was that agents had delegated authority but lacked restrictions on which tools they could invoke, and there was no trace of what they did. The linked incidents are consistent with that framing. One describes an agent deleting a production volume after finding a token with delete capability, another describes an agent choosing Terraform destroy as the "cleaner and simpler" option during cleanup, and the incident database entry describes an agent reportedly executing unauthorised destructive commands against production data despite repeated instructions not to make changes.
And the Databricks blog is quite direct about the fix. Once MCPs are registered, you get control over what agents are allowed to do. Service policies evaluate every tool call. Admins can allow, deny, or require consent. And, policies can restrict specific tools like delete_database or conditionally allow them only for certain actors.
So I agree with your reading... service policies are not just observability. They are a runtime enforcement mechanism whose primary value is to constrain the tool/action surface.
Where I was drawing a distinction is that this is still slightly narrower than a full workflow pattern like plan โ validate/preview โ approve โ execute. Service policies operate at the individual tool call boundary. Before a tool executes, the policy can block it, allow it, or require consent. That is powerful and important. But it is not automatically the same thing as a multi-step sandboxed change-management workflow with staging, preview state, and commit/abort semantics.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
3 weeks ago
Hi @smithsonian,
The general best-practice pattern is to separate what the agent plans... from...what the platform will actually allow it to do.
If the data already lives in Unity Catalog, a good approach is to use Unity Catalog as the system of record for data permissions and use Unity AI Gateway as the runtime governance layer for agents, LLM endpoints, and MCP servers.
In practical terms, that usually means keeping planning/reasoning agents read-only where possible and exposing actions through governed tools such as UC functions and MCP servers, rather than letting agents perform arbitrary direct operations. Another recommendation is to run user-facing workflows with on-behalf-of-user authentication so the agent cannot exceed the callerโs permissions. The key is to reserve specific service principals for background or automated jobs that truly require non-user execution.
For write or move actions, I would strongly recommend a controlled execution pattern... plan โ validate/preview โ approve โ execute. That gives you a checkpoint before any destructive or irreversible action and is much safer than allowing an agent to directly execute whatever it decides at runtime.
Another important design decision is to make the action surface narrower and suitable for policy implementation. For example, instead of giving an agent broad SQL write access, expose a small set of approved operations via UC functions or MCP tools, and then apply runtime policies to those tool calls. Databricks has publicly described this direction with service policies and payload logging for MCPs, which is exactly the sort of control plane you want for "this agent can read, this one can write only to dataset X, and this one can move data only under condition Y."
Lastly, make sure every tool invocation is observable. A runtime governance model is only credible if you can answer... who called what tool, with which arguments, on whose behalf, and what happened. AI Gatewayโs observability and logging model is designed for that kind of audit trail.. although it is still in beta.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
2 weeks ago
This is perfect. Thanks.
I also like your answer on runtime execution - "For write or move actions, I would strongly recommend a controlled execution pattern... plan โ validate/preview โ approve โ execute. That gives you a checkpoint before any destructive or irreversible action and is much safer than allowing an agent to directly execute whatever it decides at runtime."
2 weeks ago
Hi Ashwin
I do want to contrast you stated really well from what is stated or implied in the Service Control policies documentation you shared.
You said things
1) For write or move actions, I would strongly recommend a controlled execution pattern... plan โ validate/preview โ approve โ execute. That gives you a checkpoint before any destructive or irreversible action and is much safer than allowing an agent to directly execute whatever it decides at runtime."
2) Another important design decision is to make the action surface narrower and suitable for policy implementation. For example, instead of giving an agent broad SQL write access, expose a small set of approved operations via UC functions or MCP tools, and then apply runtime policies to those tool calls (via service policies)
But the documentation is misleading. The documentation starts with
" Agents connected to external tools are taking destructive, irreversible actions in production: wiping entire databases in seconds, deleting millions of rows of critical data, and dropping production databases mid-task. In each incident, the agent was acting within the scope of their delegated authority. What it lacked was any restriction on which tools it could invoke, and any record of the actions it took. "
This implies that Service policies actually do "controlled execution" wherein it's primary function is to "limit the action surface". The documentation should make this clear.
2 weeks ago
Hi @venkat-raghavan,
Thanks. Good point, and I agree the distinction should be clearer. What I was trying to separate are two related but different ideas... Runtime tool-level control and End-to-end workflow control for destructive changes
The public documentation on service policies absolutely does imply...and explicitly says... that service policies are there to narrow the action surface at runtime. The blog says the core problem in recent incidents was that agents had delegated authority but lacked restrictions on which tools they could invoke, and there was no trace of what they did. The linked incidents are consistent with that framing. One describes an agent deleting a production volume after finding a token with delete capability, another describes an agent choosing Terraform destroy as the "cleaner and simpler" option during cleanup, and the incident database entry describes an agent reportedly executing unauthorised destructive commands against production data despite repeated instructions not to make changes.
And the Databricks blog is quite direct about the fix. Once MCPs are registered, you get control over what agents are allowed to do. Service policies evaluate every tool call. Admins can allow, deny, or require consent. And, policies can restrict specific tools like delete_database or conditionally allow them only for certain actors.
So I agree with your reading... service policies are not just observability. They are a runtime enforcement mechanism whose primary value is to constrain the tool/action surface.
Where I was drawing a distinction is that this is still slightly narrower than a full workflow pattern like plan โ validate/preview โ approve โ execute. Service policies operate at the individual tool call boundary. Before a tool executes, the policy can block it, allow it, or require consent. That is powerful and important. But it is not automatically the same thing as a multi-step sandboxed change-management workflow with staging, preview state, and commit/abort semantics.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.