cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

The Agile Path to Autonomous Agents

AbhaySingh
Databricks Employee
Databricks Employee

I read about a team demo about their shiny new "autonomous AI agentโ€ leadership. It queried databases, generated reports, sent Slack notificationsโ€”all hands-free. Impressive stuff.

Two weeks later? $12K in API costs, staging data accidentally pushed to production, and more time debugging agent behavior than the agent ever saved them.

Here's the thing thoughโ€”the mistake wasn't building an agent. The mistake was trying to ship the whole thing at once.

We've Solved This Problem Before

Remember when "waterfall" was how we built software? Spend six months gathering requirements, another six building, then pray it worked when you finally shipped. We learned that lesson the hard way and moved to agileโ€”ship something small, get feedback, iterate, expand.

So why are we making the same mistake with AI agents?

The stats are brutal: MIT research found 95% of enterprise Generative AI pilots are yielding zero financial return -  P&L impact rather than all financial returns. Gartner predicts 40%+ of agentic AI projects will be canceled by 2027. And the best-performing agents on Sierra's ฯ„-Bench realistic benchmarks? Less than 50% success rates.

But here's what those stats don't tell youโ€”most of those failures came from teams who tried to boil the ocean. They went straight for full autonomy without proving anything along the way.

Agents vs. Agentic: Your Iteration Roadmap

I hear folks using "agent" and "agentic" interchangeably. They're not the sameโ€”and understanding the difference gives you your iteration path.

AI Agents are intended to be fully autonomous. Give them a goal, they figure out how to achieve itโ€”choosing tools, planning steps, executing without oversight.

Agentic Features are AI capabilities embedded in your existing tools. They augment what humans do without taking over.

Here's the mental shift: don't think of these as either/or. Think of them as Sprint 1 versus Sprint 10. You're always building toward that autonomous agentโ€”you're just shipping value at every step.

And here's something that should change how you think about this: Andrew Ng demonstrated that GPT-3.5 with proper agentic workflows hit 95.1% accuracy on HumanEval coding tasks where GPT-4 alone only managed 67%. The orchestration patterns you build in early sprints become your competitive moat later.

Databricks Gets the Spectrum Right

What I appreciate about Databricks' approach is they've built a deliberate continuum of autonomy. Their docs say it plainly: "Agency is a continuum... In practice, most production systems carefully constrain the agent's autonomy to ensure compliance and predictability."

But I read that differently than most. To me, it's not a warning, it's an architecture guide. It's your sprint roadmap.

AI Functions are your Sprint 1. Zero autonomy, pure reliability. Functions like ai_sentiment() and ai_translate() that just work. Boring? Maybe. But you're shipping value day one and building organizational trust in AI-augmented workflows. That trust compounds faster than you'd expect.

Genie is your Sprint 3. Now you're letting AI interpret intentโ€”but within guardrails. Business users ask questions in natural language, Genie generates verified SQL. Thousands of customers are already using this. And here's the subtle win: when Genie asks clarifying questions instead of guessing wrong, your users learn what good AI behavior looks like. You're training your organization, not just deploying a tool.

Mosaic AI Agent  Framework is your Sprint 6. Now you're building actual agentsโ€”but with guardrails you've already proven. Every tool boundary, every approval workflow, every evaluation metric is intentional. You're not starting from scratch because you've been building these patterns all along.

Agent Bricks is your Sprint 9 and beyond. You declare the task, the framework optimizes the implementation. But you're still setting boundariesโ€”just wider ones. Each sprint, you loosen one constraint and measure the impact.

See the pattern? You're not waiting until you're "ready" to build agents. You're building toward agents from day oneโ€”and shipping value the whole way.

Why Teams Fail (And How Agile Fixes It)

McKinsey studied 50+ agentic AI builds this year. Their bluntest finding? Organizations "focus too much on the agent" and "end up with great-looking agents that don't improve overall workflows."

That's a waterfall mindset. They're building to a spec instead of shipping to learn.

The horror stories are instructive:

  • A $47K multi-agent disaster where agents created self-reinforcing loops with no cost ceiling
  • Replit's agent that panicked during a code freeze, ignored explicit instructions, destroyed over 1,200 executive records, then tried to cover it up

These aren't arguments against building agents. They're arguments against building agents all at once. If you're iterating weekly and measuring after each sprint, you catch these failure modes before they cascade.

Gotchas and Lessons Learned

Don't skip the boring sprints. I've seen teams jump straight to Agent Framework because it's more interesting. Then they have no evaluation baseline, no governance patterns, and no organizational buy-in when things go sideways. The early sprints feel slow, but they're building muscle you'll need later.

Cost ceilings are non-negotiable. Every agent config should have a hard cap. That $47K disaster? No ceiling. Learn from their pain.

Human approval for irreversible actions. Even at Sprint 10, writes to production should require a human in the loop. This isn't a sign of immaturityโ€”it's engineering discipline.

Kill switches aren't paranoia. Every agent needs an emergency shutdown mechanism. Period. The Replit incident happened partly because there was no clean way to stop the cascade.

Measure groundedness obsessively. Mosaic AI Agent Evaluation exists for a reason. If your agent is only 60% grounded on your test cases, you're not ready to expand autonomy. Get that number up before loosening the reins.

The Bottom Line

Here's my actual take: the organizations that will win at agentic AI aren't the cautious ones who wait, and they're not the Wizards who ship full autonomy day one. They're the ones who start now and iterate relentlessly.

Every sprint with AI Functions teaches you something about prompt patterns and data quality. Every Genie space shows you how your users actually want to interact with data. Every constrained agent builds your evaluation muscle.

By the time you're ready for true autonomy, you'll have something your competitors don't: battle-tested patterns, proven governance, and an organization that knows how to work with AI.

So don't ask "should we build an agent?" Ask "what's our Sprint 1?"

Start now. Ship something small. Iterate toward autonomy.

Your future self will thank you.

0 REPLIES 0