Databricks Community

WiliamRosa · a week ago

The Problem Nobody Likes to Admit.

Imagine this scenario: your data team has built a flawless lakehouse. Ingest pipelines, bronze/silver/gold tiers, gleaming dashboards. Everything is working perfectly.

Until someone asks: "And the production app? Where does it store the transactional data?"

That's where the headache begins. You need a separate OLTP database (Postgres, MySQL, DynamoDB...), CDC pipelines to bring data into the lakehouse, reverse ETL to return enriched data to the app, and an infrastructure team to keep it all running.

The result? Data silos, synchronization latency, operational complexity, and ever-increasing costs.

Traditional Architecture (and Its Pain Points)

Here's how most companies operate today:

Pain points in this architecture:

Multiple tools and suppliers for managing
Significant latency between writing on OLTP and availability on Lakehouse.
Fragmented governance — Unity Catalog doesn't see the external bank.
High operational costs associated with synchronization pipelines.

What is Lakebase?

Lakebase is a fully managed Postgres database natively integrated with the Databricks Data Intelligence Platform. It is designed to bridge the gap between transactional (OLTP) and analytical (OLAP) workloads, unifying everything into a single ecosystem .

In simple terms: it's like having a high-performance Postgres server living inside your lakehouse , with unified governance via Unity Catalog, native bidirectional synchronization, and modern capabilities such as autoscaling, scale-to-zero, and database branching.

The New Architecture with Lakebase:

What changes?

Zero external database infrastructure
Native bidirectional synchronization (no Debezium, no Airflow, no pain)
Unified governance through the Unity Catalog
A single control plane for OLTP + OLAP

The Architectural Innovations of Lakebase

Lakebase is not "just another managed Postgres." It brings modern data engineering concepts to the transactional world.

1. Separation of Compute and Storage

Unlike traditional data banks where CPU and disk are coupled, Lakebase completely separates computing resources from storage. This means you scale each independently, paying only for what you use.

2. Copy-on-Write Storage

The storage system uses a copy-on-write approach. In practice, when you create a branch of the database, there is no data duplication —only the changes are stored separately. This makes operations like branching and restoring virtually instantaneous.

3. Autoscaling and Scale-to-Zero

The compute system automatically adjusts its capacity based on demand. During periods of inactivity, the database scales to zero , eliminating costs. When a request arrives, it "wakes up" in seconds.

Database Branching: Git for Your Data

This is probably the most innovative feature. Just as developers create branches in Git to work on isolated features, Lakebase allows you to create branches for the entire database .

Powerful use cases:

Development : each developer has their own branch of the database, without interfering with production.
Migration testing : test schema changes in an isolated branch before applying them to production.
Instant Restore : Restore the database to any point in time (configurable window from 0 to 30 days) by creating a branch from that point.

Two-Way Synchronization: The End of Reverse ETL

One of the biggest advantages is the native synchronization between Lakehouse and Lakebase:

Synced Tables (Lakehouse → Lakebase)

Unity Catalog tables are automatically synchronized to Lakebase, allowing applications to query rich analytical data with low latency. Supports Snapshot, Triggered, and Continuous modes.

Lakehouse Sync (Lakebase → Lakehouse)

Transactional data from Lakebase is continuously replicated to Delta tables in the Unity Catalog using Change Data Capture (CDC). The destination tables follow the SCD Type 2 standard , maintaining a complete history of changes.

This completely eliminates the need for:

External CDC tools (Debezium, Fivetran)
Reverse ETL pipelines (Census, Hightouch)
Custom synchronization jobs in Airflow/Prefect

Three Strategic Use Cases

Feature Serving for Real-Time ML

Lakebase functions as an online store for Databricks' Feature Store. Features computed in the lakehouse are synchronized via Synced Tables to Lakebase, from where ML models query them with millisecond latency.

State of AI Agents

AI agents need to persist state between requests — conversation context, action history, workflow data. Lakebase provides a native transactional database to store this state with ACID consistency.

Transactional Data for Applications

Databricks Apps (or any external application) can use Lakebase as their primary database. The integration is native: simply add the Lakebase project as a resource in your app. Additionally, the Data API offers a PostgREST-compatible REST interface for direct HTTP access.

Comparison: Before and After

Availability

Lakebase Autoscaling is available in the following AWS regions:

us-east-1, us-east-2,us-west-2
ca-central-1, sa-east-1
eu-central-1, eu-west-1,eu-west-2
ap-south-1, ap-southeast-1,ap-southeast-2

The presence in sa-east-1 is particularly relevant for us in the Brazilian community, ensuring low latency for applications hosted in Brazil.

Conclusion

Lakebase represents a paradigm shift: instead of treating OLTP and OLAP as separate worlds that need complex "bridges," it unifies them into a single platform.

For Brazilian data teams, this means:

Fewer tools to manage and integrate.
Fewer pipelines that silently break down at 3 a.m.
More time focused on generating value with data.
Real governance across the entire data lifecycle — from transactional writing to the executive dashboard.

Lakehouse finally has its native transactional database. And it speaks Postgres.

This post was inspired by concepts from the official Databricks documentation. For more technical details, please refer to the Lakebase documentation .

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa