6 hours ago
The OLTP architecture your agentic systems actually need, and how it compares to Supabase, Azure PostgreSQL, and Cosmos DB
Earlier this year, Nikita Shamgunov — the engineer leading Databricks Lakebase — published a number that reframed my entire architecture review: AI agents now create roughly 4x more databases than human developers.
Not 4x more queries. 4x more databases.
If you're building agentic AI systems on Databricks and still reaching for Supabase, Azure Database for PostgreSQL, or Cosmos DB as your OLTP layer — this article will challenge that decision. Not because those platforms are bad. They're not. But because they were designed for a world where humans write schemas, humans provision databases, and humans decide when something scales. Agents don't work that way. And the architecture that serves human-paced development quietly breaks under agentic workloads.
I learned this the hard way while building an internal Agentic Intelligence Platform at Celebal Technologies — three agent modules (Swarm Coordination, Ontology-Based Reasoning, and Causal Optimization) sharing a unified LLMOps spine on Databricks. I'll show you exactly what I got wrong in the database layer, what Lakebase changes, and how the alternatives stack up for teams building enterprise AI on Databricks.
Traditional database architecture assumes a human-paced world. Applications write transactions. Dashboards read. ETL pipelines shuttle data between the OLTP and OLAP layers. The entire stack was designed around predictable access patterns and a well-understood divide between operational and analytical data.
Agents shatter all three of those assumptions simultaneously.
They're inherently ephemeral. A swarm agent coordinating a supply chain analysis spins up, decomposes a task across five specialist agents, writes hundreds of state checkpoints, and terminates — all in under thirty seconds. The next invocation may run on a completely different thread with zero shared context from the prior session. Legacy databases aren't built for disposable, bursty compute that needs to scale to zero between workloads and spin back up instantly for the next one.
They generate massive, high-frequency state churn. Every tool call, reasoning step, context retrieval, and handoff between agents is a potential checkpoint. For a multi-turn swarm agent handling a complex analytical task, that's hundreds of writes per session — each requiring exact-ID retrieval by thread_id or session_id, not vector similarity search. Postgres handles this natively. A Delta table, even a well-ZORDER'd one, adds overhead for an access pattern it was never designed to serve.
They need to reach analytical data without crossing a platform boundary. An agent recommending inventory adjustments needs to query the Gold Delta tables — the same tables your ML models trained on, governed by the same Unity Catalog policies your data engineering team enforces. If your OLTP layer lives outside Databricks, you're building a data copy pipeline just so your agent can read data that's already on the platform.
That third problem is where I went wrong.
When I built the Swarm Coordination module of our Agentic Intelligence Platform, I used a Unity Catalog Delta table as the shared persistent memory store for multi-turn agent sessions. Delta was a reasonable first choice — it gave me time travel for session debugging, UC lineage on every agent write, and the ability to query session history in SparkSQL.
But Delta is an OLAP-optimized storage format. When the coordinator agent needed to retrieve the exact current state for a specific thread_id, it was running a scan-optimized query engine against a point-lookup workload. I added ZORDER on (session_id, turn_number) and tuned file sizes — which helped. But it was always the wrong tool for the access pattern.
What the architecture actually needed was a clean separation of concerns:
Lakebase is the transactional half of that equation. And it's the piece I didn't have.
Lakebase is Databricks' fully managed, serverless PostgreSQL database — built on the Neon architecture (which Databricks acquired) and integrated natively into the Databricks platform. It reached General Availability in February 2026. Here are the capabilities that directly change the agent architecture:
Lakebase is a supported LangGraph checkpointer backend on both Databricks Apps and Model Serving endpoints. Authentication between your application and Lakebase is resolved automatically through the platform's Service Principal — no credential management in application code, no secret rotation for a separate database connection string.
from langgraph.checkpoint.postgres import PostgresSaver from databricks.sdk import WorkspaceClient # Databricks resolves authentication automatically via Service Principal w = WorkspaceClient() conn_str = w.lakebase.get_connection_string(instance_name="agent-state-prod") # LangGraph Postgres checkpointer backed by Lakebase checkpointer = PostgresSaver.from_conn_string(conn_str) # The agent now has durable, OLTP-grade session state agent = create_react_agent(model, tools, checkpointer=checkpointer)
This is the pattern you'd apply to the Swarm Coordination module. The coordinator's session state — which agent it's routing to, which specialist has already responded, the current confidence score — lives in Lakebase. The MLflow Trace of the full execution graph is separate (logged as a Databricks artifact). Two different concerns, two different stores, each doing what it does best.
This is the capability that directly addresses the "4x more databases" pattern. Lakebase supports copy-on-write branching: a full, isolated branch of a production-scale database in under one second, at near-zero initial storage cost (only diffs are written on change).
For agents, this changes what's possible:
Databricks telemetry shows production Lakebase deployments averaging roughly 10 branches per database project, with some agent-driven workflows reaching hundreds of nested iterations. That pattern is structurally impossible with traditional managed Postgres where creating a copy requires duplicating the full storage filesystem.
Agent workloads are bursty in a way that application workloads rarely are. Thousands of concurrent sessions during business hours, complete silence at 2am. Lakebase scales its compute up under load and down to zero between workloads — costs align with actual usage, not provisioned capacity. For multi-agent platforms running on Databricks Apps, this means the transactional backend matches the compute model of the application layer itself.
Every write to Lakebase is automatically synced to Delta tables in Unity Catalog. For agent systems, this is what closes the long-term memory loop without custom code:
Lakebase instances are registered in Unity Catalog under the same 3-level namespace as your Delta tables and ML models. The same row-level security policies, column masking, lineage graphs, and access audit logs that govern energy_nz.solar.gold also govern the Lakebase instance storing agent session state. For enterprise AI systems operating under regulatory oversight, this is a structural requirement — not a preference.
Supabase is an excellent platform for its target use case. Postgres, auth, storage, real-time subscriptions, and edge functions bundled into a working backend in minutes — at $25/month, it's exceptionally competitive for early-stage web applications. But for enterprise agentic systems on Databricks, there are two structural gaps that don't close with configuration: there is no Unity Catalog (agents operating on governed enterprise data need the same governance layer as the data itself), and there is no Lakehouse sync (analytical data still requires an ETL pipeline to reach Supabase, and Supabase data requires an ETL pipeline to reach the Lakehouse for monitoring and ML). Supabase asks you to build and maintain that bridge. Lakebase eliminates it.
Azure Database for PostgreSQL Flexible Server is a solid choice for traditional Azure-native transactional workloads. But compute and storage are coupled together — creating an isolated development copy of a production database requires duplicating the full storage volume, an operation measured in hours and charged by the gigabyte. There is no native database branching, no Lakehouse sync, and the governance model (Azure RBAC) is entirely separate from Unity Catalog. For teams building on Azure Databricks who want a single governance boundary across OLTP, OLAP, and ML — this means managing two different access control systems with no native bridge between them.
Azure Cosmos DB is purpose-built for globally distributed, multi-region, flexible-schema NoSQL workloads — a genuinely different problem from agentic state management. It's not PostgreSQL-compatible, which means LangGraph's Postgres checkpointer doesn't apply, standard psycopg2 drivers don't connect, and the document model doesn't naturally represent the relational shape of session checkpoints and handoff records. Cosmos DB is the right answer for a different question.
With Lakebase available, the architecture for the three modules changes specifically:
Module 1 — Swarm Coordination:
Module 2 — Ontology-Based Reasoning:
Module 3 — Causal Optimization:
The net effect: short-term transactional operations at Postgres latency, long-term analytical operations at Delta scale, a single Unity Catalog governance layer across both, and zero custom ETL pipelines connecting them.
A credible recommendation has boundaries. Lakebase is not the right choice when:
The decision criterion is simple: how close is your agent workload to your Databricks analytics and ML stack? The closer it is, the more Lakebase earns its place.
Databricks started as the platform where you process and model data. Unity Catalog is the platform where you govern data. Lakebase makes it the platform where you run transactional applications on that data — without copying it, without bridging governance models, without maintaining a second operational stack alongside your analytics stack.
The 4x database creation stat isn't a curiosity. It's a forcing function. When agents provision databases at that rate, every architectural inefficiency — the manual provisioning, the ETL pipeline, the separate governance model — compounds at agent speed. Human architects designed those inefficiencies in; agents will expose them.
After rebuilding the Agentic Platform architecture mentally with Lakebase in place, the change is not additive — it's structural. It's the difference between three systems (OLTP, OLAP, ML) connected by pipelines you maintain, and one platform where those boundaries exist only in your mental model.
If this resonated, I'd welcome your thoughts in the comments — especially if you've hit the OLTP/OLAP boundary problem in your own agentic architectures. What did your workaround look like?