Databricks Community

MandyR · 2 hours ago

At Databricks, we innovate in every department. Most recently, the Community team solved a tricky problem by building an internal app and as part of that process, reduced our in-app AI assistant’s time-to-first-response from 30 seconds to under one second for critical, repetitive queries. How did we do this, you might ask? We bypassed the multi-agent system’s tool call mechanism (RAG/SQL) for known facts. This post details how we use a Lakebase-backed pattern to pre-hydrate page-aware user context server-side, inject it directly into the agent’s prompt, and deliver instant, context-rich insights that feel like the agent is watching the user the entire time. This playbook is portable and relevant to any in-app agent experience where latency is non-negotiable.

Slow agents and scaling a community program

The Databricks Community forums have 200,000+ members asking real, work-critical questions about building on the platform. As a way to elevate the quality and frequency of answers, last fall we launched the Community Fellows pilot: a tiered, gamified internal advocacy program that activates Bricksters (aka Databricks employees) to answer community questions, with quality scoring that flows into performance reviews.

Six months in, the pilot turned into a full-on program: Brickster reply volume is up 112%, time-to-first-response is down 83% (20 hours to under 4), and accepted-solution rate is up 27%. We scaled from 8 Fellows to 45+ without adding ops headcount.

However, that growth came with a problem. The community platform we use is not built for an internal advocacy program, so the pilot ran on spreadsheets, Slack threads, and a Google Form. In this scenario, manually extracted data was error prone, and by the time we hit above 20 Fellows, the ops layer was eating 2–3 hours a day. Additionally, Fellows were answering the same questions at the same time, causing confusion and disappointment, so we needed to organize and track activity better.

While our challenge was specific to scaling this community program, the core technical problem we are addressing is how to quickly build a user-friendly AI assistant and tackle slow supervisor agents. This is relevant to any in-app agent experience where instant responses are critical.

The solution: we built our way out with the Community Fellows Hub, a Databricks App backed by Lakebase, with Agent Bricks running an in-app AI assistant named CORA (Community Observations, Research, and Analytics).

As we built this, one of the most problematic things we faced was the slow responses from CORA. We needed to do something about it, and this post explores a critical technical hurdle in that journey: bridging the gap between slow agent reasoning and the need for a truly instant AI assistant.

The bottleneck: supervisor agents not fast enough

To give Fellows the support they needed, CORA was built to handle two primary tasks: providing real-time status updates on their performance (points, rank, active claims) and assisting them with answering community questions by retrieving relevant documentation and past discussions. For example, a Fellow might ask, “What’s my current rank?” or “What’s the best doc on Unity Catalog grants?”

To achieve this depth, CORA is a multi-agent system: a supervisor with two children, a knowledge assistant using RAG over our community conversations and a Genie agent using SQL over our metric views. Genie plans queries, generates SQL, runs them, and writes up an answer. The knowledge assistant retrieves and reranks documentation. End-to-end: 20 to 30 seconds per call.

That’s a fair price for a chat window where the user expects depth. It is not fine for a sidebar that should feel instant. A few seconds of dead air is the difference between something Fellows use every day and something they quietly ignore.

The fix: hydrate context from Lakebase before the agent runs

The data CORA needs most often (current points, rank, active claims, recent activity, badge proximity) already lives in Lakebase, our transactional Postgres layer. Lakebase reads come back in under 100ms.

So instead of waiting for the multi-agent system to dispatch a tool call to fetch that data, the app hydrates it server-side in the same request that builds CORA’s prompt, and injects it straight into context. CORA answers immediately with data she’d otherwise have spent 30 seconds fetching from Genie.

The flow:

The chat request carries a page_context payload: the current route, plus any entity the user is looking at (a question, an appeal, a fellow card).
The FastAPI handler reads it and calls build_fellow_context().
A handful of small Lakebase queries fire in parallel:
- active claims
- quarter-to-date points ledger
- rank window
- plus one or two page-specific queries keyed to the route
Results are assembled into a structured markdown block, a few hundred tokens at most.
The block is injected ahead of the user’s message as its own context payload, with the supervisor instructed to use it instead of fetching the same facts itself.
The supervisor sees a fully hydrated picture of the user’s state and the page they’re on. It answers directly instead of dispatching to Genie for facts it already has.

Page-aware, unprompted insight

CORA can’t see the UI, so the app tells her what the user is looking at. On the leaderboard, she gets the user’s rank, the gap to the people around them, and what those people have been up to lately. On a question the user is about to answer, she gets the question itself plus the user’s track record on similar ones.

She also doesn’t wait to be asked. Open the side panel and she’s already talking: “two questions you signed up for are about to expire, and you’re 40 points from your next rank.” Click a question to answer and a brief appears before the user has finished reading the title: this looks like a Unity Catalog grants issue, here’s the doc that usually fixes it, here’s a thread from last month where someone solved it.

That’s what makes the non-obvious read possible too: “you’re 3 points behind #5, but the person at #4 hasn’t answered anything in two weeks, so that gap will close on its own.” She didn’t reason her way there in real time. The app curated the data. She synthesized it.

The full Genie path stays open for explicit quantitative questions like “how many Fellows answered Lakeflow questions last quarter?”, but it’s the exception now, not the default.

Freshness comes from below

Unity Catalog governs both our transactional and analytical data the same way. Lakebase is registered as a catalog in UC, right alongside our Delta tables. Our gold-layer tables sync into Lakebase via UC synced tables, so rank and points stay current with no cache to invalidate.

The agent gets a fast OLTP query path on the same governed tables that power our dashboards. Time-to-first-token stays under a second, and the insight feels like CORA’s been watching the whole time.

The pattern - please steal!

If you want to cut time-to-first-token for an agent inside an app, here’s the playbook. It works because the app knows things the agent doesn’t: who the user is, what they’re looking at, what they’re likely to ask about. That’s enough signal to pre-fetch a useful slice of state before the agent runs.

This isn’t a universal speedup for agentic chat. The pattern lives in the sweet spot where the app has more context about the interaction than the agent does.

Find your agent’s hot path. The questions it gets asked over and over: points, rank, recent activity, the state of whatever the user is working on. The data it currently solves with a tool call.
Make sure that data lives somewhere fast. Lakebase is our pick (obviously). If your source of truth is a Delta table, sync the slice you need into Postgres.
Hydrate in the same handler that takes the user’s message. Run the queries in parallel, assemble the results into a structured markdown block. Keep it tight.
Inject the block ahead of the user’s message before calling the agent. The supervisor treats it as known context and stops dispatching tools to find what’s already there.

You’re not replacing tool calls, you’re skipping the predictable ones.

To make the experience page-aware

Define a small page_context schema in your frontend: current route, plus any visible entity IDs. Send it on every chat request, and also on navigation with no user message attached — route both to the same handler so CORA can speak first when the user lands somewhere new.

// One small type, used everywhere the user might trigger CORA
interface PageContext {
  route: string;                         // "/leaderboard", "/claims/123"
  entity_type?: string;                  // "claim", "appeal", "fellow"
  entity_id?: string;
  filters?: Record<string, string>;
}

// Sent on every chat message AND on navigation (with empty message,
// so CORA can speak first when the user lands somewhere new).
async function chat(message: string, pageContext: PageContext) {
  await fetch("/api/chat/messages", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ conversation_id, message, page_context: pageContext }),
  });
}

Then write per-page context functions on the server. What does the agent need to know about this page that the user can’t already see on screen? That’s the gold. Fire the queries in parallel and assemble the results into a tight markdown block the agent can read at a glance. Finally, inject the block ahead of the user’s message before calling the supervisor.

# Each route declares the queries CORA actually needs to answer for that page.
# Most pages share a base set; detail pages add one or two extras.
PAGE_QUERIES = {
    "default":      {"claims", "qtd_points", "leaderboard_window"},
    "leaderboard":  {"claims", "qtd_points", "rank_distance", "badge_proximity"},
    "claim_detail": {"claims", "qtd_points", "reply_brief"},
}

async def build_fellow_context(fellow, *, page_context=None) -> str:
    route = (page_context or {}).get("route", "default")
    needed = PAGE_QUERIES.get(route, PAGE_QUERIES["default"])
    # Fire every required query in parallel. All hit Lakebase (<100ms each).
    coros = {name: run_lakebase_query(name, fellow["fellow_id"]) for name in needed}
    results = dict(zip(coros, await asyncio.gather(*coros.values())))
    # Format into a tight markdown block the agent can read at a glance.
    parts = [
        f"Fellow: {fellow['first_name']} {fellow['last_name']}",
        f"QTD points: {results['qtd_points']['total']} (rank #{results['qtd_points']['rank']})",
        f"Active claims: {len(results['claims'])}",
    ]
    if "rank_distance" in results:
        parts.append(f"Gap to #{results['rank_distance']['target_rank']}: "
                     f"{results['rank_distance']['delta']} pts")
    return "\n".join(parts)


@router.post("/chat/messages")
async def send(body: ChatRequest, fellow = Depends(get_current_user)):
    fellow_context = await build_fellow_context(fellow, page_context=body.page_context)
    messages = [
        {"role": "user", "content": f"[SYSTEM CONTEXT]\n{SYSTEM_PROMPT}"},
        {"role": "user", "content": f"[FELLOW CONTEXT]\n{fellow_context}"},
        *load_history(body.conversation_id),
        {"role": "user", "content": body.message},   # empty on navigation events
    ]
    return await call_supervisor(messages)

Finally, instruct the agent to use the provided context instead of a tool call.

The system injects a [FELLOW CONTEXT] block before each conversation.
It includes the fellow's identity, status, points, rank, active claims,
recent recommendations, and accepted suggestions.

Use this context directly for status questions — do NOT call agents to
re-fetch data that is already provided. Only call agents when the fellow
asks for something beyond the context (e.g., searching for specific
community discussions, looking up metrics trends, or exploring a topic
in depth).

Do NOT call Community-Analytics-Genie when the answer is already in
the injected context. Fellow points, rank, claim counts, queue depth,
recommendations, and accepted suggestions are pre-fetched — use them.
If a fellow asks "what's my rank this quarter," the context already
has it; don't call Genie.

Try Lakebase

CORA is one piece of a larger build, but the pattern is portable: any agent UX where latency matters and the answers come from data you already own can do this.

Try Lakebase →

Commitchell · 11m ago

It's an absolute pleasure collaborating with the community team! It's so cool to see this engagement pattern live, where the agent (CORA) engages when the page loads instead of the user having to type a question.