Databricks Community

Nivethan_Venkat · ‎07-24-2025

Introduction
Motivation
What is Databricks Lakebase?
Key Features
Working with Lakebase
1. Prerequisites:
2. Enabling Lakebase:
3.Creating a Lakebase Database:
4. Authentication:
5. Authorisation:
6. Using Lakebase data in Analytical load (No ETL):
7. Sending Lakebase update to Lakehouse (ETL):
8. Syncing Lakehouse update to Lakebase (Sync tables):
9. Querying Lakebase :
10. Branching : Working with Child instance
Conclusion
References

Introduction

Databricks Lakebase is a new, fully managed OLTP (Online Transaction Processing) database engine, designed to seamlessly integrate transactional and analytical workloads within the Databricks Data Intelligence Platform. Currently available in public preview across multiple regions, Lakebase is built on a Postgres foundation and aims to bridge the gap between traditional databases and modern data lake architectures.

This is an introductory blog about Databricks Lakebase and it’s capabilities, more detailed information will be published in Part-2 of this blog series.

Motivation

Online Transaction Processing (OLTP) systems have long been the backbone of enterprise software — powering everything from banking applications to e-commerce platforms. Systems like PostgreSQL, MySQL, and Oracle have matured over decades to handle millions of transactions per second in structured, stateful workloads.

But as organisations shift toward AI-driven applications, real-time analytics, and data-centric architecture, the limitations of traditional OLTP systems become increasingly apparent.

Limitations with traditional OLTPIn this landscape, Databricks Lakebase emerges as a game-changing offering. A fully-managed, PostgreSQL-compatible OLTP engine natively integrated into the Databricks Lakehouse Platform, Lakebase blends the transactional strength of Postgres with the elasticity, analytics, and governance of the Lakehouse.

What is Databricks Lakebase?

Lakebase allows organisations to create OLTP databases directly on Databricks, leveraging Databricks-managed storage and compute. This integration means you can run high-throughput, low-latency transactional workloads (like those traditionally handled by PostgreSQL or cloud-native OLTP systems) while keeping data in sync with your analytical Lakehouse environment.

Lakebase highlights

Key Features

Key features

Lakebase gives developers a fully-managed Postgres database with cloud-native enhancements like instant provisioning, branching (think git checkout and git branchfor databases), and real-time sync with Delta for analytics.

Working with Lakebase

Lakebase integrates closely with Databricks unity catalog and managed at workspace level, below architecture depict the placement of Lakebase along with analytical layer.

Target Architecture (OLAP with OLTP in Databricks)

1. Prerequisites:

Unity Catalog must be enabled in your workspace
Access granted to Lakebase (via Admin Console or Support)

2. Enabling Lakebase:

Currently the feature is in Public Preview as highlighted in the below image, but soon it will be GA.

Enable PostgreSQL OLTP database Preview:

3.Creating a Lakebase Database:

Click on Compute tab in Workspace UI
Click on and navigate to OLTP Database instances tab in compute pane
Click Create database instance

4. Authentication:

Once we have Lakebase instance next question comes in mind… How to use Lakebase instance (database) ?

To use one can connect to database via SQL Client or Programatically over JDBC in Notebook. An OAuth token is needed for identities to connecting to database. Identities could be databricks users (user to machine ) or service principal (machine to machine ). Tokens can be obtained from UI or programmatically as standard process. More on this can be found here.

Obtaining Tokens Manually

Obtaining Tokens Programmatically

5. Authorisation:

As Lakebase is managed PostgreSQL it offers both UnityCatalog and PGSQL personas to govern data access. As Unity catalog is unified governance component of Databricks stack, it is inherently integrated with Lakebase. Moreover users who want to use PostgresSQL interface they can use PostgresSQL roles as well.

Lakebases permissions set up with different Connection mechanismTo perform database operations like read and write on postgres database, follow the best practices applicable on database_roles and privileges required on respective role.

Database_roles:https://www.postgresql.org/docs/current/database-roles.html
Privileges: https://www.postgresql.org/docs/current/ddl-priv.html

For leveraging more privileges w.r.to RLS and other Postgres native roles, refer the above links mentioned against Database_roles and Privileges.

Postgres Native Permission for Databricks Identity

6. Using Lakebase data in Analytical load (No ETL):

To unify the Databricks user experience it make sense to have a way to access and govern database instance using unity catalog. The database created in Lakebase can be registered as a catalog in Unity Catalog for better governance and access provisioning. It act as federated data source which can be easily used in analytical processing.

This means we can access OLTP data without any ETL into our analytical work loads.

Added Database as Catalog into UC

Synced Lakebase Database in Unity Catalog

7. Sending Lakebase update to Lakehouse (ETL):

LakeFlow declarative pipelines is a powerful and efficient way to create and maintain a OLAP (DLT) that mirrors an OLTP (Lakebase) in your Databricks Lakehouse. It simplifies handling out-of-order data and managing updates, inserts, and deletes.

Auto CDC Delta changes from-OLTP-to-OLAP

In the below snippet, it is provided the baseline syntax for syncing delta changes from OLTP to OLAP. Refer the documentation for which options are necessary in your case.

import dlt

dlt.create_auto_cdc_flow(
 target = "<olap-table>",           #OLAP target table to be updated
 source = "<oltp-table>",           #OLTP source table to be referenced
 keys = ["key1", "key2", "keyN"],   #Columns that uniquely identify a row in the source data
 sequence_by = "<sequence-column>", #Logical order of CDC events in the source data
 ignore_null_updates = False,
 apply_as_deletes = None,
 apply_as_truncates = None,
 column_list = None,
 except_column_list = None,
 stored_as_scd_type = <type>,
 track_history_column_list = None,
 track_history_except_column_list = None
)

8. Syncing Lakehouse update to Lakebase (Sync tables):

Data can be synced to and from UC table. A synced / online table can be created on top of UC table with create Synced table option available under Create section for any UC table.

Synced Table creation from UC table

Points to note: Synced / Online table can be created under Lakebase catalog or in the separate catalog with respective database instance for creating sync table.

Optionally: Primary Key and Timeseries Key can be given in the synced table creation for fetching latest / new records from the OLAP table. For using Triggered / Continuous mode for synced table options, ChangeDataFeed needs to be enabled on the source OLAP table.

9. Querying Lakebase :

There are multiple options to query the table / view from Lakebase Database.

DBSQL: Native SQL editor with warehouse endpoint can be used to query within Databricks with respective privileges.

Querying in DBSQL layer

Databricks Notebook: Databricks Notebook as well can be used for querying from Lakebase Database with interactive / SQL warehouse cluster.

Querying in Databricks Notebook

SQL Client: SQL clients can also be used to interact with the tables after authenticating with the Lakebase Database instance. More info upon using SQL clients can be found here.

DBeaver — SQL Client Desktop App

CLI — Using psql client to interact with Lakebase Database

10. Branching : Working with Child instance

Lakebase allows you to create branches of your Postgres database almost instantly and this will be useful in various scenrios like:

Rapid Data Restoration: Instantly restore lost data by creating a database copy from timestamp.
Safe Testing and Validation: clone a recent production environment to safely test changes or run integration tests without affecting live data.
Compliance: Easily generate a database snapshot from any past date to support audits, reconciliations, or investigations.

Branching / Cloning DB instance

Key features include:

Copy-on-write: Branches are lightweight clones. They initially share the parent’s data without duplication. Storage costs only increase for the changes (deltas) made within a branch.
Isolation: Each branch operates independently. Changes made in one branch do not affect the parent or other branches. This is perfect for development, testing, or running experiments without impacting production data.
Speed: Creating a branch takes only a few seconds.
Connection string: Each branch gets its own unique connection string, allowing applications to connect directly to it.

Below is the example of Lakebase database to be cloned / branched without disrupting the data in Parent DB

Branched OLTP database for additional purpose

Conclusion

Lakebase collapses the long-standing wall between OLTP and analytics. By fusing serverless Postgres semantics, AI-native branching, and lakehouse governance, it gives engineers a single surface to transact, analyse, and iterate at the speed of machine learning. For teams already on Databricks, adoption is a configuration, not a migration. And it unlocks low latency queries, elastic economics, and real-time Delta sync out of the box. As Lakebase heads toward GA later this year, the real question isn’t why you’d converge OLTP and OLAP, but how soon you’ll start.

References

Databricks Lakebase documentation: https://docs.databricks.com/aws/en/oltp/
Public Preview announcement: https://www.databricks.com/blog/announcing-lakebase-public-preview
Product, Pricing and more: https://www.databricks.com/product/lakebase
Release notes-DataAISummit2025: https://www.databricks.com/blog/what-is-a-lakebase
Benchmark repo: https://github.com/dediggibyte/diggi_lakebase

Louis_Frolio · ‎07-24-2025

Here's a more casual version:

Nice writeup! This unified OLTP+OLAP approach solves a lot of headaches we deal with when data is scattered across different systems. The database branching thing caught my eye - having dev environments that work like git branches is a game changer.

Really like the idea of real-time sync without having to build custom ETL pipelines. That's always been such a pain point. Curious to see how the GA version performs compared to what's available now in preview.

Anyone here actually tried this out with heavy transaction loads? Would love to hear about real-world performance.

Thanks for putting this together Nivethan_Venkat - the diagrams definitely help show how it all connects.

Cheers, Lou.

Sharanya13 · ‎08-20-2025

Awesome details @Nivethan_Venkat . Can you please specify the real-world use cases for Lakebase?