2 hours ago
By Mandy Ross and Mitchell Grewer
At Databricks, we innovate in every department. Most recently, the Community team solved a tricky problem by building an internal app and as part of that process, reduced our in-app AI assistantโs time-to-first-response from 30 seconds to under one second for critical, repetitive queries. How did we do this, you might ask? We bypassed the multi-agent systemโs tool call mechanism (RAG/SQL) for known facts. This post details how we use a Lakebase-backed pattern to pre-hydrate page-aware user context server-side, inject it directly into the agentโs prompt, and deliver instant, context-rich insights that feel like the agent is watching the user the entire time. This playbook is portable and relevant to any in-app agent experience where latency is non-negotiable.
The Databricks Community forums have 200,000+ members asking real, work-critical questions about building on the platform. As a way to elevate the quality and frequency of answers, last fall we launched the Community Fellows pilot: a tiered, gamified internal advocacy program that activates Bricksters (aka Databricks employees) to answer community questions, with quality scoring that flows into performance reviews.
Six months in, the pilot turned into a full-on program: Brickster reply volume is up 112%, time-to-first-response is down 83% (20 hours to under 4), and accepted-solution rate is up 27%. We scaled from 8 Fellows to 45+ without adding ops headcount.
However, that growth came with a problem. The community platform we use is not built for an internal advocacy program, so the pilot ran on spreadsheets, Slack threads, and a Google Form. In this scenario, manually extracted data was error prone, and by the time we hit above 20 Fellows, the ops layer was eating 2โ3 hours a day. Additionally, Fellows were answering the same questions at the same time, causing confusion and disappointment, so we needed to organize and track activity better.
While our challenge was specific to scaling this community program, the core technical problem we are addressing is how to quickly build a user-friendly AI assistant and tackle slow supervisor agents. This is relevant to any in-app agent experience where instant responses are critical.
The solution: we built our way out with the Community Fellows Hub, a Databricks App backed by Lakebase, with Agent Bricks running an in-app AI assistant named CORA (Community Observations, Research, and Analytics).
As we built this, one of the most problematic things we faced was the slow responses from CORA. We needed to do something about it, and this post explores a critical technical hurdle in that journey: bridging the gap between slow agent reasoning and the need for a truly instant AI assistant.
To give Fellows the support they needed, CORA was built to handle two primary tasks: providing real-time status updates on their performance (points, rank, active claims) and assisting them with answering community questions by retrieving relevant documentation and past discussions. For example, a Fellow might ask, โWhatโs my current rank?โ or โWhatโs the best doc on Unity Catalog grants?โ
To achieve this depth, CORA is a multi-agent system: a supervisor with two children, a knowledge assistant using RAG over our community conversations and a Genie agent using SQL over our metric views. Genie plans queries, generates SQL, runs them, and writes up an answer. The knowledge assistant retrieves and reranks documentation. End-to-end: 20 to 30 seconds per call.
Thatโs a fair price for a chat window where the user expects depth. It is not fine for a sidebar that should feel instant. A few seconds of dead air is the difference between something Fellows use every day and something they quietly ignore.
The data CORA needs most often (current points, rank, active claims, recent activity, badge proximity) already lives in Lakebase, our transactional Postgres layer. Lakebase reads come back in under 100ms.
So instead of waiting for the multi-agent system to dispatch a tool call to fetch that data, the app hydrates it server-side in the same request that builds CORAโs prompt, and injects it straight into context. CORA answers immediately with data sheโd otherwise have spent 30 seconds fetching from Genie.
The flow:
page_context payload: the current route, plus any entity the user is looking at (a question, an appeal, a fellow card).build_fellow_context().CORA canโt see the UI, so the app tells her what the user is looking at. On the leaderboard, she gets the userโs rank, the gap to the people around them, and what those people have been up to lately. On a question the user is about to answer, she gets the question itself plus the userโs track record on similar ones.
She also doesnโt wait to be asked. Open the side panel and sheโs already talking: โtwo questions you signed up for are about to expire, and youโre 40 points from your next rank.โ Click a question to answer and a brief appears before the user has finished reading the title: this looks like a Unity Catalog grants issue, hereโs the doc that usually fixes it, hereโs a thread from last month where someone solved it.
Thatโs what makes the non-obvious read possible too: โyouโre 3 points behind #5, but the person at #4 hasnโt answered anything in two weeks, so that gap will close on its own.โ She didnโt reason her way there in real time. The app curated the data. She synthesized it.
The full Genie path stays open for explicit quantitative questions like โhow many Fellows answered Lakeflow questions last quarter?โ, but itโs the exception now, not the default.
Unity Catalog governs both our transactional and analytical data the same way. Lakebase is registered as a catalog in UC, right alongside our Delta tables. Our gold-layer tables sync into Lakebase via UC synced tables, so rank and points stay current with no cache to invalidate.
The agent gets a fast OLTP query path on the same governed tables that power our dashboards. Time-to-first-token stays under a second, and the insight feels like CORAโs been watching the whole time.
If you want to cut time-to-first-token for an agent inside an app, hereโs the playbook. It works because the app knows things the agent doesnโt: who the user is, what theyโre looking at, what theyโre likely to ask about. Thatโs enough signal to pre-fetch a useful slice of state before the agent runs.
This isnโt a universal speedup for agentic chat. The pattern lives in the sweet spot where the app has more context about the interaction than the agent does.
Youโre not replacing tool calls, youโre skipping the predictable ones.
Define a small page_context schema in your frontend: current route, plus any visible entity IDs. Send it on every chat request, and also on navigation with no user message attached โ route both to the same handler so CORA can speak first when the user lands somewhere new.
// One small type, used everywhere the user might trigger CORA
interface PageContext {
route: string; // "/leaderboard", "/claims/123"
entity_type?: string; // "claim", "appeal", "fellow"
entity_id?: string;
filters?: Record<string, string>;
}
// Sent on every chat message AND on navigation (with empty message,
// so CORA can speak first when the user lands somewhere new).
async function chat(message: string, pageContext: PageContext) {
await fetch("/api/chat/messages", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ conversation_id, message, page_context: pageContext }),
});
}
Then write per-page context functions on the server. What does the agent need to know about this page that the user canโt already see on screen? Thatโs the gold. Fire the queries in parallel and assemble the results into a tight markdown block the agent can read at a glance. Finally, inject the block ahead of the userโs message before calling the supervisor.
# Each route declares the queries CORA actually needs to answer for that page.
# Most pages share a base set; detail pages add one or two extras.
PAGE_QUERIES = {
"default": {"claims", "qtd_points", "leaderboard_window"},
"leaderboard": {"claims", "qtd_points", "rank_distance", "badge_proximity"},
"claim_detail": {"claims", "qtd_points", "reply_brief"},
}
async def build_fellow_context(fellow, *, page_context=None) -> str:
route = (page_context or {}).get("route", "default")
needed = PAGE_QUERIES.get(route, PAGE_QUERIES["default"])
# Fire every required query in parallel. All hit Lakebase (<100ms each).
coros = {name: run_lakebase_query(name, fellow["fellow_id"]) for name in needed}
results = dict(zip(coros, await asyncio.gather(*coros.values())))
# Format into a tight markdown block the agent can read at a glance.
parts = [
f"Fellow: {fellow['first_name']} {fellow['last_name']}",
f"QTD points: {results['qtd_points']['total']} (rank #{results['qtd_points']['rank']})",
f"Active claims: {len(results['claims'])}",
]
if "rank_distance" in results:
parts.append(f"Gap to #{results['rank_distance']['target_rank']}: "
f"{results['rank_distance']['delta']} pts")
return "\n".join(parts)
@router.post("/chat/messages")
async def send(body: ChatRequest, fellow = Depends(get_current_user)):
fellow_context = await build_fellow_context(fellow, page_context=body.page_context)
messages = [
{"role": "user", "content": f"[SYSTEM CONTEXT]\n{SYSTEM_PROMPT}"},
{"role": "user", "content": f"[FELLOW CONTEXT]\n{fellow_context}"},
*load_history(body.conversation_id),
{"role": "user", "content": body.message}, # empty on navigation events
]
return await call_supervisor(messages)
Finally, instruct the agent to use the provided context instead of a tool call.
The system injects a [FELLOW CONTEXT] block before each conversation.
It includes the fellow's identity, status, points, rank, active claims,
recent recommendations, and accepted suggestions.
Use this context directly for status questions โ do NOT call agents to
re-fetch data that is already provided. Only call agents when the fellow
asks for something beyond the context (e.g., searching for specific
community discussions, looking up metrics trends, or exploring a topic
in depth).
Do NOT call Community-Analytics-Genie when the answer is already in
the injected context. Fellow points, rank, claim counts, queue depth,
recommendations, and accepted suggestions are pre-fetched โ use them.
If a fellow asks "what's my rank this quarter," the context already
has it; don't call Genie.
CORA is one piece of a larger build, but the pattern is portable: any agent UX where latency matters and the answers come from data you already own can do this.
11m ago - last edited 6m ago
It's an absolute pleasure collaborating with the community team! It's so cool to see this engagement pattern live, where the agent (CORA) engages when the page loads instead of the user having to type a question.