cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why We Moved Our Operational Database Into Databricks — And Stopped Managing Two Stacks

naveen0808
New Contributor II

Lakebase just went GA. Here's what a production migration actually looks like.


For most of the last decade, our data infrastructure lived in two separate worlds.

On one side: a transactional database handling operational workloads — the writes, the lookups, the real-time application queries. On the other: the lakehouse handling analytics, ML features, historical reporting, everything batch.

These two worlds never fully talked to each other. Data moved between them through pipelines. Pipelines broke. Governance existed in one place but not the other. When an ML model needed features derived from operational data, you'd build a sync job, pray it stayed in sync, and explain to stakeholders why the numbers in the app and the numbers in the dashboard were subtly different.

This is the architecture most data teams are still running. It's familiar enough that most people have stopped questioning it.

When Databricks released Lakebase into general availability this year, I decided it was worth questioning.


What the problem actually was

The split-stack problem sounds abstract until you live through a specific version of it.

Our operational system handled real-time booking and status updates. Our analytics lakehouse handled everything downstream — revenue reporting, demand forecasting, customer behavior analysis. Getting data from one to the other meant a pipeline with a lag. That lag was acceptable for reporting. It was not acceptable when the ML model feeding real-time pricing decisions was working off features that were hours behind the current state of the world.

The standard fix is to build faster pipelines. We did. The pipelines helped but didn't solve the root issue: two systems, two governance models, two places where schema changes could silently break something downstream before anyone noticed.

What we actually needed was one system — transactional and analytical in the same governed layer.


What Lakebase is

Lakebase is Databricks' serverless PostgreSQL product. The architectural premise is straightforward: run OLTP workloads — the kind of transactional writes and point lookups that live in your application database — inside the same Unity Catalog governance layer that already governs your lakehouse.

In practice this means a real PostgreSQL-compatible database. Standard connection strings. Standard client libraries. Your application code doesn't know it's talking to Databricks. But the data inside it is governed, observable, and accessible to the same pipelines, notebooks, and ML workflows that read your Delta tables.

The feature that changes the architecture calculus most is database branching. You can create a full copy of a production database in seconds — not minutes, not a backup restore — and use it for testing, for staging deployments, for letting a data scientist explore without touching production state. When you're done, you discard the branch. The underlying storage is shared, so the copy is nearly free until you start writing to it.


What the migration looked like

We moved a non-critical but real operational workload first. Deliberately not our most important system — we wanted to understand failure modes without the pressure of a production incident.

The migration itself was less dramatic than expected. Lakebase is PostgreSQL-compatible, which meant our application connection strings changed and almost nothing else did. Stored procedures, queries, ORM configurations — they came over cleanly.

What required real work was rethinking how we had been handling environment promotion. Previously, promoting a schema change from development to staging to production involved backup-restore cycles, migration scripts, and coordination across two teams. With branching, the workflow became: create a branch from production, run the migration against the branch, validate, merge. The same pattern a software engineer uses for code, applied to database state.

The first time we used this in a real deployment it felt slightly wrong — it was too easy. That feeling faded.


What changed for the ML team

This is where the real payoff showed up, and it wasn't something I had fully anticipated when we started.

Previously, building ML features from operational data meant a pipeline, a lag, and a constant negotiation about acceptable staleness. The model knew about the world as it was some hours ago. For some use cases that was fine. For anything real-time or near-real-time, it was a constraint we worked around rather than solved.

With the operational data in Lakebase and Lakebase inside Unity Catalog, the ML feature pipeline is a query, not a sync job. The features are derived directly from the live operational state. The lag went from hours to the latency of a SQL query.

More importantly: the governance model is the same. Column-level permissions, data lineage, access auditing — the operational data gets the same treatment as everything else in the lakehouse. We stopped maintaining two permission models and stopped explaining to compliance why certain data was governed in one system but not the other.


What doesn't work yet

Lakebase is GA but it's early GA. A few things worth knowing before you start planning a migration:

Complex analytical queries with large aggregations don't belong in Lakebase. It's OLTP-optimized. For heavy analytics you still read from Delta tables in your lakehouse. The architecture isn't Lakebase replacing everything — it's Lakebase handling the operational write path while Delta handles the analytical read path, with Unity Catalog connecting both.

Region availability is still rolling out. Check your specific cloud and region before planning anything time-sensitive.

The branching feature is powerful but requires you to rethink how you test database migrations. Teams with deeply embedded backup-restore workflows will need to update their runbooks. Not hard, but it requires intentional change.


The honest verdict

We didn't eliminate the complexity of running operational and analytical workloads together. We moved that complexity into the platform instead of carrying it ourselves.

The pipelines that used to sync data between two stacks are gone. The permission model that existed in two places exists in one. The ML features that used to be hours stale are current. The deployment workflow that used to involve backup-restore cycles uses branching.

None of these are revolutionary in isolation. Together, they add up to a meaningful reduction in the operational overhead of running a data platform that serves both applications and analytics.

The two-stack world made sense for a long time because there was no good alternative. There's an alternative now.


Naveen Ayalla is a Senior Data Engineer with experience building petabyte-scale data platforms and real-time ML pipelines across aviation and enterprise technology.

1 REPLY 1

Mailendiran
New Contributor III

Great write up and felt useful. Thanks for sharing the real experience.!