Databricks Community

Brahmareddy · Thursday

Hello Data Professionals, I need to tell you about something.

I have been working with data platforms for a long time. Long enough to remember when "big data" was the buzzword, when Hadoop was the answer to everything, when data lakes were going to replace data warehouses, and when we all realized they would not.

In that time, I have seen many product launches. Most of them are incremental. A faster engine. A better UI. A new connector. Useful but forgettable. You update your stack, adjust your pipelines, and move on.

Every few years, something comes along that is different. Not a better version of what existed before. A completely new way of thinking about the problem.

Databricks Lakebase is one of those things.

And I do not think the data engineering community fully understands yet how big this is.

The problem that has existed for decades

If you have built data systems for any meaningful amount of time, you know this pain intimately.

Your applications write data to one place. An operational database. PostgreSQL. MySQL. SQL Server. DynamoDB. Whatever your team chose. This is where your users interact with your product. Orders are placed. Profiles are updated. Transactions are recorded. This is your OLTP system.

Your analytics happen somewhere else. A data warehouse. A data lake. A lakehouse. This is where your analysts build dashboards, your data scientists train models, and your leadership team makes decisions. This is your OLAP system.

Between these two worlds, there is a wall. And that wall has a name. ETL.

Every organization builds pipelines to move data from the operational database to the analytical platform. Extract it. Transform it. Load it. Every day. Every hour. Sometimes every few minutes.

These pipelines break. The source schema changes and nobody told the data team. The transformation logic has a bug that nobody catches for three weeks. The load job fails at 3 AM and nobody knows until the morning dashboard is empty.

The entire modern data stack exists, in large part, to manage the consequences of keeping operational and analytical data in separate places.

I have spent years of my career building, maintaining, debugging, and rebuilding these pipelines. Every data engineer I know has. It is the water we swim in. We accepted it as the cost of doing business.

Lakebase says: what if we just removed the wall?

What Lakebase actually is

Let me explain this as simply as I can, because the official announcements use a lot of architecture language that can obscure the fundamental insight.

Lakebase is a fully managed, serverless PostgreSQL database that runs inside the Databricks platform. That sentence alone is interesting but not revolutionary. Lots of companies offer managed Postgres.

Here is what makes it different.

When your application writes data to Lakebase, that data is stored directly in lakehouse storage. Not in a separate database engine that needs to be synced to the lakehouse later. Not in an isolated OLTP system that requires ETL pipelines to feed your analytics. Directly in the lakehouse. In Delta format. Governed by Unity Catalog.

Read that again.

Your application writes an order. That order exists in the lakehouse immediately. Your analyst can query it immediately. Your data scientist can include it in a training dataset immediately. Your dashboard refreshes with it immediately.

No pipeline. No ETL. No sync job. No lag. No "the data will be available in the warehouse tomorrow morning."

The operational data and the analytical data are the same data. In the same place. Governed by the same system.

That is the breakthrough.

Why this matters more than it sounds

I know what some of you are thinking. "Okay, it is a managed Postgres in Databricks. That is convenient but is it really that big a deal?"

Yes. And here is why.

Think about how much of your data engineering work exists solely because operational and analytical data live in different places.

The ingestion pipelines that move data from your application database to your lakehouse. Gone. Lakebase writes directly to lakehouse storage.

The CDC (Change Data Capture) systems that track what changed in your operational database so you can replicate those changes to your warehouse. Gone. The changes are already in the lakehouse because that is where the data lives.

The data freshness problems where your dashboard shows yesterday's numbers because the nightly ETL has not run yet. Gone. The data is current because there is no copy. There is one source.

The governance headaches where you maintain separate access controls for your application database and your lakehouse and hope they stay in sync. Gone. Unity Catalog governs everything in one place.

The schema drift incidents where your application team adds a column and your ETL pipeline breaks because nobody communicated the change. Dramatically reduced. The schema exists in one place.

I am not saying ETL pipelines disappear entirely. External data sources still need ingestion. Third-party APIs still need connectors. Legacy systems still need integration. But the largest and most painful category of data movement, the one between your own applications and your own analytics, that pipeline just became unnecessary.

That is not an incremental improvement. That is a category elimination.

What I built with it

I spent two weeks experimenting with Lakebase after it went GA. I wanted to see if the promise held up in practice. Here is what I found.

Setting it up took less time than I expected. Lakebase is serverless with autoscaling and scale-to-zero. You create a project, get a Postgres connection string, and start writing data. No instance sizing. No replica configuration. No storage provisioning. It scales up when traffic increases and scales to zero when idle. You pay for what you use.

If you know Postgres, you know Lakebase. It is standard Postgres with full compatibility. The extensions I needed were there, including pgvector for vector search. My existing application code connected without modifications. The ORM worked. The migration tools worked. The monitoring tools worked.

The moment that made me stop and think was when I wrote a row from my application and then queried it from a Databricks notebook using SQL. Same data. Same governance. No pipeline. No delay. I had spent so many years accepting that operational and analytical data exist in different worlds that seeing them unified felt genuinely strange. Like a constraint I had internalized had been quietly removed.

Instant branching was another feature that changed how I think about development. You can create a zero-copy clone of your production database in seconds. Not a snapshot that takes 20 minutes. Not a restore from backup. A zero-copy branch that appears instantly. I used it to test a schema migration against production data without touching production. When I was done, I merged the changes. The whole workflow felt like Git for databases.

Point-in-time recovery lets you restore to a specific millisecond. Not "last night's backup." A specific millisecond. I accidentally ran an UPDATE without a WHERE clause during testing. Restored to 30 seconds before the mistake. No data lost. No panic.

What this means for AI

Here is where I think Lakebase becomes truly transformative.

AI agents need state. They need to remember conversations. They need to store tool outputs. They need to access real-time operational data to make decisions. They need all of this with the performance of an OLTP database and the governance of an enterprise data platform.

Before Lakebase, building this meant stitching together a Postgres instance for state, a vector database for embeddings, a lakehouse for historical context, and a governance layer to keep it all secure. Four systems. Four sets of credentials. Four potential points of failure.

With Lakebase, all of this runs on one platform. Agent state lives in Lakebase. Vector search runs through pgvector in Lakebase. Historical context lives in the lakehouse that Lakebase writes to natively. Unity Catalog governs everything. One platform. One governance model. One connection string.

I built a simple agent prototype that reads operational data from Lakebase, queries historical patterns from the lakehouse, performs a vector similarity search, and writes its decisions back to Lakebase. The entire thing ran on one platform. No data movement. No sync jobs. No separate vector database.

That is not just simpler. It is faster to build, cheaper to run, and easier to govern.

What this means for data engineers

If you are learning data engineering right now, pay attention to Lakebase. Not because you need to learn it today. But because it represents where the industry is going.

The trend is unmistakable. The separation between operational systems and analytical systems is collapsing. The future is not better ETL. It is less ETL. The future is not faster pipelines between databases and warehouses. It is architectures where those pipelines are unnecessary.

That does not mean data engineers become irrelevant. It means the work changes. Less time building and maintaining data movement. More time designing data models. More time building governance frameworks. More time creating the systems that AI agents depend on. More time thinking about what the data means and less time fighting to get it from point A to point B.

The data engineers who will thrive in this world are the ones who understand both sides. Who can think about an application writing a transaction and an analyst querying a trend in the same breath. Who can design systems where operational and analytical workloads coexist on a shared foundation.

That is a more interesting job. And a more valuable one.

The bigger picture

Databricks crossed $5.4 billion in annual revenue run-rate in early 2026, growing over 65 percent year over year. They raised additional funding at a $134 billion valuation and explicitly said they would use it to accelerate Lakebase and Genie.

When a company growing at this rate places its biggest bet on a product that eliminates the boundary between applications and analytics, that is a signal worth paying attention to.

Lakebase is not a side project. It is not an experiment. It is the future of the platform. And adoption is growing at more than twice the rate of Databricks' data warehousing product.

Organizations like Warner Music Group and Hafnia are already running production workloads on Lakebase, alongside major air transport and logistics companies. The launch partner ecosystem includes global consulting firms and specialist data companies who have validated it for database modernization, real-time applications, and agentic AI workflows.

This is happening now. Not in a roadmap. Not in a preview. In production.

My take

I have been cautious about getting excited over product launches for years. I have seen too many "game-changing" announcements that turned out to be incremental improvements with good marketing.

Lakebase is not that.

The idea of writing operational data directly to lakehouse storage, governed by a unified catalog, queryable by both applications and analysts without any data movement, is genuinely new. It has been attempted before in various forms, but never with this level of integration, this level of managed simplicity, and this level of ecosystem support.

Is it perfect? No. It is still early. The feature set will expand. The regional availability will grow. The ecosystem of tools and connectors will mature. There will be edge cases and limitations that only emerge at scale.

But the fundamental architecture, the decision to collapse the wall between OLTP and OLAP into a single governed foundation, that is right. And I believe history will show that this is the moment the industry started moving in this direction in earnest.

If you are building data systems today, learn about Lakebase. If you are learning data engineering, understand why this matters. If you are making architecture decisions for your organization, evaluate whether the pipeline you are about to build could be replaced by a unified platform.

The wall between applications and analytics has stood for decades. Databricks just put a door in it. And I think most of us are going to walk through.

Thanks all!