The Databricks Data Intelligence Platform unifies data, AI, and governance so organizations can put all of their data to work. Until recently, though, operational workloads still lived outside the platform — requiring separate databases, duplicated data, and manual ETL pipelines to power applications.
Databricks Lakebase changes that. It’s a fully managed, PostgreSQL database that extends the Lakehouse to low-latency, transactional use cases. By bringing OLTP and OLAP together on one platform, Lakebase unlocks new real-time application patterns, including:
In short: your data isn’t just for analytics anymore — it can now power your apps, APIs, and ML models in real time.
The diagram above illustrates how Databricks Lakebase extends the Lakehouse from analytics into operational data serving.
On the left, we have the familiar Medallion Architecture:
Bronze → Silver → Gold tables represent the curation path of raw data into high-quality, governed datasets inside Unity Catalog. These tables power analytical workloads like BI dashboards, machine learning models, and AI agents.
On the right, Lakebase is a fully managed Postgres environment. Databricks provides a managed sync pipeline that automatically replicates selected tables into Lakebase as “Synced Tables.” This enables applications to perform low-latency reads—on the order of tens of milliseconds—without the need for custom ETL pipelines or separate databases.
| Tip: In Databricks, the terms database and schema are used interchangeably inside a catalog. In Postgres (and therefore in Lakebase), a database is a higher-level container — and each database contains multiple schemas. So while a Postgres database isn’t exactly the same as a Databricks catalog, that’s the closest conceptual equivalent. It sits at the “top-level” within a Lakebase instance. |
Synced Tables are read-only replicas of your Lakehouse tables, refreshed through Databricks-managed sync pipelines.
You can also create Postgres-native tables directly in Lakebase. These handle inserts, updates, and deletes—ideal for managing application state, session data, or event logs alongside analytical data.
Finally, Lakebase can also be federated into Unity Catalog for metadata visibility, lightweight querying, and permission management through Unity Catalog. While this doesn’t replicate the underlying data, it provides a simple way to explore and query what’s stored in Lakebase without leaving Databricks.
Now that we’ve seen how Lakebase fits into the broader Data Intelligence Platform, let’s walk through a few best practices, tips, and code snippets that will save you a lot of time for when you want to sit down and try this out. Here’s what I’ve learned in the field working on Lakebase with customers:
One of the first things you’ll run into with Lakebase is authentication. Lakebase uses short-lived OAuth tokens for database access — typically valid for only one hour. That’s great for security, but not so great when you’re trying to debug queries in pgAdmin.
Use the Built-In Lakebase Query Editor for Development:
For quick tests, use the PostgreSQL query editor built right into Databricks. We authenticate you automatically so you don’t need to fetch a token. It’s the easiest way to validate your synced tables, run GRANTs, or inspect schemas. The PostgreSQL editor is a bit tucked away so here’s how to get to it:
Use Databricks Oauth tokens, not EntraID tokens
If you’re on Azure, you may be tempted to generate EntraID/Azure tokens for your Service Principal. At the time of this writing, these will not work - they have to be Databricks oauth tokens
Our official documentation has great examples of how to refresh tokens for applications programmatically
If you’re connecting from Databricks App, check out the Databricks Apps Cookbook: connect to Lakebase
Use native Postgres logins when Oauth is not an option
Oauth should be your first choice for authenticating to Lakebase. However, If you have workloads that can’t rotate tokens, you can configure your Lakebase instance to support native Postgres roles. This will allow you to create a persistent password that can be used to authenticate to Lakebase.
If you want your application — or a group of Databricks users — to query data in Lakebase, you first need to authorize them inside the Postgres database itself. While Lakebase integrates with Unity Catalog for Databricks identities, permissions are still enforced locally within Postgres when querying through Postgres interfaces, which you will be doing to get low latency.
This is one of the most common sources of confusion for new users:
That’s because Postgres permissions live inside the Lakebase instance itself — not in Unity Catalog. So, even though identities are federated from Databricks, authorization must be explicitly set in Lakebase
There are 3 steps:
You can run the commands below directly in the Lakebase Query Editor (or through a driver):
|
If you’re bringing data into Lakebase, it’s almost always because you need low-latency lookups. In relational systems like Postgres, indexes are how you get sub-10ms reads—but only when they match your data access pattern.
When a Delta table is synced into Lakebase:
Let’s see how this works with an example. I’ve created a synced table, users_synced with 1M rows where user_id is the Primary Key. Using EXPLAIN ANALYZE, we can see the query plan and query execution time.
Query Using the Primary Key (Indexed Automatically)
|
From the query plan, we can see that Lakebase is performing an index scan and we’re getting lightning fast responses.
Query Using a Non-Indexed Column
Now, lets try querying with a non-index field (email):
|
Because there was no index on email, Lakebase had to scan the entire table. 83 ms might be acceptable for some apps but the latency grows quickly as data scales.
Query Using a Manually Created Index (email)
|
After adding the index, we can see we’re able to perform an index scan and our query execution time drops from 83ms to just 0.082ms.
You can also create composite indexes for queries with multiple filters.
|
Indexes accelerate reads, but they’re not free. Every index in Lakebase is a physical structure stored alongside your data. They also add overhead to your writes and syncs. So, use indexes to boost performance, but only index what you query.
If you’re exploring Lakebase, your journey probably doesn’t stop at the Databricks UI. Most teams want a custom application or API that gives end users a fast, interactive way to work with governed data.
This is where Lakebase comes to life: it brings the reliability and governance of the Lakehouse to operational experiences, enabling sub-second reads and transactional writes directly from your applications.
Once you’ve set up OAuth authentication and Lakebase-side permissions (as covered above), your app connects to Lakebase just like any Postgres database — through a standard driver or ORM such as SQLAlchemy or psycopg. Make sure to reference the documentation for examples on rotating tokens programmatically and connection pooling.
If your application is hosted within Databricks Apps, there’s an even simpler path to integrate with Lakebase. Databricks apps are tightly integrated into the Databricks Data Intelligence Platform - and Lakebase fits naturally into that ecosystem. If your application is hosted through Databricks Apps, you can fast-forward much of the setup required to connect to Lakebase.
When you declare a Lakebase instance as a resource for your Databricks App, Databricks automatically:
That means you don’t have to manage role creation or connection information - it’s automatically handled when the app deploys.
Oauth token rotation is still your responsibility. Your application still needs to periodically request and refresh new tokens. The Databricks Apps cookbook includes examples for:
When deploying through a Databricks Asset Bundle (DAB), you can promote your app and Lakebase resource definitions together across environments (dev -> stage -> prod). Here’s an example of what the apps section in your DAB YAML file would look like:
|
This pattern ensures consistency across deployments - the same app, the same Lakebase configuration, just parameterized for each environment.
Just like jobs, clusters, and SQL warehouses, Lakebase costs show up in Databricks system tables, but Lakebase combines multiple components:
|
Cost Category |
SKU |
How to Attribute It |
|
Lakebase Compute |
DATABASE_SERVERLESS_COMPUTE |
Tag the Lakebase instance |
|
Lakebase Storage |
DATABRICKS_STORAGE |
Tag the Lakebase instance |
|
Sync Pipeline Compute |
JOBS_SERVERLESS_COMPUTE |
Tag the Synced Table |
Without tagging, Lakebase costs show up as generic database spend. Tagging lets you answer questions like:
You can break down Lakebase usage through the Unity Catalog system.billing.usage table:
|
Lakebase isn’t just “Postgres on Databricks.” It’s a shift in how we think about operational and analytical data living together — governed, queryable, and powered by the same platform.
Instead of standing up a separate database, building reverse ETL jobs, and managing auth in two places, you can now:
Success with Lakebase isn’t about just turning it on — it’s about designing with access patterns in mind, getting auth right, and indexing only what your apps truly query.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.