Databricks Lakebase is a serverless Postgres database built directly into the Databricks Data Intelligence Platform. Beyond running standard transactional apps, it powers the Databricks Online Feature Store, acting as the low-latency serving layer that holds machine learning features for your production models. This integration links your offline training data directly to your online models without the operational headache of managing separate database systems.
To populate the store, you just sync your offline feature tables from Unity Catalog. Databricks replicates the data to Lakebase automatically. You can set the sync to run on a schedule or stream updates continuously, which saves you from writing and maintaining custom data pipelines.
When a recommender system is serving a live shopper, it has roughly two hundred milliseconds to fetch features, run inference, and return a decision. In that window, the model must look up who the user is, what they just clicked, and what's trending right now, then combine those signals into a ranked result. If any of those inputs i.e. the user's profile, the item catalog, the session counters, lag behind reality, the model ranks against a world that no longer exists. And if the lookups themselves are slow, so is the page.
Getting fresh, governed feature values into the scoring call without exceeding that budget is an engineering challenge. This blog covers how to solve it using Lakebase, the managed Postgres database in the Databricks Data Intelligence Platform.
Once populated, Lakebase supports multiple ingestion patterns for writing features and multiple read patterns for serving them. The right combination depends on how fresh your data needs to be and how your scoring endpoint is built. We'll walk through three patterns, each making a different trade-off between operational governance, complexity, and latency. In practice, real recommenders, fraud systems, and personalization pipelines rarely commit to just one, they mix and match across all three, feature by feature.
Figure 1. Inference flow with a ~200 ms end-to-end latency budget, showing where the Lakebase feature lookup sits.
It's worth being explicit about why a feature store sits in the stack at all before getting into how Lakebase serves one. A feature store gives teams a governed place to define a feature once and reuse it across projects, with versioning, access control, and lineage handled in Unity Catalog rather than reinvented per model. It also means each feature is computed ahead of time and looked up at inference, so the scoring path reads a stored value instead of recomputing aggregates on every request. The property that matters most for a live model is consistency: the values served at inference come from the same definitions that produced the training data, which prevents the online-offline skew that slowly erodes model quality.
Most teams already store features in Unity Catalog as Delta tables e.g. country, lifetime spend, a daily popularity score per item. These features are then used for training, analytics, and reporting. That's the offline feature store: governed, versioned, and usually queried by a SQL warehouse or a Spark job that scans columns across millions of rows. Those compute engines, optimized to run analytical workloads, excel at heavy columnar scans, but struggle with the point lookup ("give me the feature row for a user with userid 12345 right now") that a scoring endpoint makes once per request. The engine optimizes for throughput, while a sub-millisecond key lookup requires a different database shape.
Lakebase is the online feature store of the Databricks Data Intelligence Platform, designed specifically for low-latency feature serving at inference time. The FeatureEngineeringClient in the databricks-feature-engineering library exposes a publishing API that continuously syncs a Unity Catalog Delta feature table to Lakebase through a managed streaming pipeline that Databricks runs on your behalf.
A recommender system draws on two broad classes of features, split by how fast they change.
The first class tolerates features that are hours or even a day old. The second loses its value the moment it falls more than a few seconds behind.
Freshness is not free. A daily refresh is a single batch job. A per-minute or frequent refresh requires a streaming pipeline running continuously. Per-event state in near-real-time adds a custom streaming writer and a Lakebase instance sized for sustained write throughput. The engineering decision for every feature is the same: weigh the cost of keeping it fresh against the performance cost of letting it lag.
Three patterns cover most of what teams build on Lakebase.
Figure 2. The freshness ladder: feature freshness ranging from days-fresh to sub-second, each rung mapped to its serving pattern on Lakebase.
No single pattern satisfies governance, sub-second freshness, and low operational overhead at once. Production pipelines mix and match across all four rungs of the ladder, choosing the right pattern feature by feature.
Managed Publish covers most common use cases. Any feature your team would put in a Unity Catalog Delta table (and most should) can be published to Lakebase with fe.publish_table. You get UC lineage, access control, time-travel, and a managed sync pipeline. You pick the freshness with the publish_mode argument.
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
online_store = fe.get_online_store(name="my-online-store")
fe.publish_table(
online_store=online_store,
source_table_name="main.rec.user_profile_features",
online_table_name="main.rec.user_profile_features_online",
publish_mode="CONTINUOUS",
)
Three modes:
Pick the mode by freshness requirement: TRIGGERED for a daily-refreshed user profile, CONTINUOUS for hourly item popularity, SNAPSHOT for a small static lookup.
Declarative Window is the right pattern when a feature is a windowed aggregate over a raw event source, think "sum of transaction amount per user over the last seven days" or "count of orders per user on a sliding seven-day window."
You declare the aggregate once in Python against a Unity Catalog source table. From that single materialize_features call, Databricks does three things: computes the aggregate, lands it in Unity Catalog as a Delta table for offline use, and syncs it to Lakebase as an online table for serving. The same definition drives both the point-in-time-correct training set and the low-latency lookup at inference time, with no separate publish_table step required.
Declarative Window is in Beta and requires databricks-feature-engineering >= 0.15.0 and DBR 17.0 ML or newer (Serverless is fine). The code below targets the SDK 0.15.0 API shape; earlier shapes used fe.create_feature / ContinuousWindow / entity_columns.
SlidingWindow and TumblingWindow are the two windows the platform will materialize for you. The aggregation function, the window, and the source come together on a single Feature, and materialize_features(features=..., offline_config=..., online_config=...) does the rest:
from databricks.feature_engineering import FeatureEngineeringClient
from databricks.feature_engineering.entities import (
Feature, DeltaTableSource, AggregationFunction,
Sum, Count,
SlidingWindow, TumblingWindow,
OfflineStoreConfig, OnlineStoreConfig, CronSchedule,
)
from databricks.sdk.service.ml import MaterializedFeaturePipelineScheduleState
from datetime import timedelta
fe = FeatureEngineeringClient()
tx_source = DeltaTableSource(
catalog_name="main", schema_name="rec", table_name="transactions",
)
amount_sum = Feature(
source=tx_source,
entity=["user_id"],
timeseries_column="transaction_time",
function=AggregationFunction(
Sum(input="amount"),
TumblingWindow(window_duration=timedelta(days=7)),
),
name="amount_sum_tumbling_7d",
)
order_count = Feature(
source=tx_source,
entity=["user_id"],
timeseries_column="transaction_time",
function=AggregationFunction(
Count(input="transaction_id"),
SlidingWindow(
window_duration=timedelta(days=7),
slide_duration=timedelta(days=1),
),
),
name="order_count_sliding_7d_1d",
)
registered = [
fe.register_feature(feature=f, catalog_name="main", schema_name="rec")
for f in (amount_sum, order_count)
]
fe.materialize_features(
features=registered,
offline_config=OfflineStoreConfig(
catalog_name="main", schema_name="rec",
table_name_prefix="declfeat_offline",
),
online_config=OnlineStoreConfig(
catalog_name="main", schema_name="rec",
table_name_prefix="declfeat_online",
online_store_name="my-online-store",
),
trigger=CronSchedule(
quartz_cron_expression="0 */5 * * * ?",
timezone_id="UTC",
pipeline_schedule_state=MaterializedFeaturePipelineScheduleState.ACTIVE,
),
)
Databricks automatically sets up a UC Delta table with an "offline" prefix and a synced Lakebase table with an "online" prefix for every registered feature. The refresh frequency is determined by a cron schedule, which effectively sets the floor for data freshness. For example, using a source of 50,000 transaction rows, a single execution of 0 */5 * * * ? completed the offline table writes in roughly two minutes. Notably, sliding-window features generate more rows than tumbling ones because each daily slide creates its own entry. The corresponding Lakebase synced tables, each containing 2,000 rows keyed by user_id, appeared approximately forty seconds later. You can query this online data as standard Postgres using the psycopg library:
Lakebase declfeat_online: 2,000 rows
cols: ['user_id', 'amount_sum_tumbling_7d']
row : (0, 690.86)
row : (1, 271.39)
Everything covered so far lands data on a fixed schedule, which works well when the source is a Delta table updated in batch. When the source is a Kafka topic and the model needs the most recent few seconds of activity, the same Declarative API extends to streaming.
Streaming Declarative Features (Private Preview) lets you declare a feature with a KafkaConfig source. Databricks then runs a managed pipeline that reads the topic, computes the aggregate, and writes to Lakebase, with end-to-end freshness typically in the sub-second range. RollingWindow (renamed from ContinuousWindow in SDK 0.15.0) is the window type that fits here, since trailing aggregates over a streaming topic have no batch boundaries. The Feature shape is identical to the Delta path described above; what changes is the source.
Because the same Feature definition drives both training and serving, the windowed aggregate the platform writes to Lakebase matches the training-side math exactly, the same property that makes the Delta path appealing carries over to Kafka.
To try it, get a customer workspace allowlisted through the preview request portal and follow the streaming declarative features docs.
Figure 3. Three patterns feeding one Lakebase online store. Direct Stream shows the native PostgreSQL Sink alongside the `foreach` control path for per-partition parallel writes and custom merge logic.
Some features are too dynamic for any managed publish pipeline, counts of what a user did in the current session, how fast they are clicking, when their last action happened. Recomputing these from a Delta table on a one-minute cadence is too slow for inference, and their per-user granularity makes batch aggregation across all users wasteful. Direct Stream is built for features like these.
The pattern is straightforward: you write seconds-fresh state directly to Lakebase from a streaming job. Two paths are available.
The PostgreSQL Sink handles row-for-row replication into Lakebase with managed upserts, automatic credential refresh, and deadlock-safe ordering, the right choice when you want Databricks to manage the write mechanics.
foreach gives you full control over the write, with one Lakebase connection per Spark partition and custom upsert SQL running in parallel across executors, the right choice when the write logic is non-trivial or you want the parallelism of executor-side writes.
The two paths handle different shapes of problems, and production pipelines often use both.
Structured Streaming has a native Postgres Sink. Batching, automatic retries, Lakebase credential refresh every microbatch, and deadlock-safe row ordering are all handled under the hood:
(spark.readStream.format("delta").table("main.rec.clickstream_events")
.writeStream
.format("postgresql")
.option("endpoint", "my-lakebase-endpoint")
.option("database", "databricks_postgres")
.option("dbtable", "rec.session_state")
.option("upsertkey", "user_id")
.option("checkpointLocation", "/Volumes/main/rec/ckpt/direct_stream")
.option("batchinterval", "50 milliseconds")
.outputMode("update")
.trigger(processingTime="5 seconds")
.start())
Under the hood, the sink issues INSERT ... ON CONFLICT (upsert_key) DO UPDATE SET ..., sorts rows by upsert key within each batch to avoid deadlocks, and refreshes the Lakebase OAuth token every microbatch so long-running queries don't fail on token expiry.
Supported triggers: Real-Time Mode, ProcessingTime, AvailableNow, Once. Real-Time Mode and this sink compose on DBR 18.0+, so the sub-second freshness floor from the ladder is reachable today on dedicated clusters.
The native sink is a row-for-row replicator: stream a DataFrame in, get the same rows out into Lakebase. The moment your write needs more than a single INSERT ... ON CONFLICT DO UPDATE template, or you want the parallelism of executor-side writes, it falls outside the native sink and belongs in foreach. The stateful work, collecting the last ten actions per user, counting distinct products viewed in the last minute, computing per-user velocity, still happens upstream in Structured Streaming. foreach is just the sink that lands those windowed rows.
foreach works best whenever the write needs more than a row-by-row copy:
Here's the shape, drawn from a production Lakebase pipeline that maintains two features per user (the last ten actions, and a count of distinct products viewed) keyed on a one-minute tumbling window. The streaming aggregation does the heavy lifting in Spark:
from pyspark.sql import functions as F
from pyspark.sql.functions import (
window, collect_list, struct, expr, approx_count_distinct, when,
)
result = (
events
.withWatermark("event_ts", "1 minute")
.groupBy(F.col("user_id"), window(F.col("event_ts"), "1 minute"))
.agg(
collect_list(struct("event_ts", "action", "product_id")).alias("actions"),
approx_count_distinct(
when(F.col("action") == "view_pdp", F.col("product_id"))
).alias("distinct_products_viewed"),
)
.select(
F.col("user_id"),
F.col("window.start").alias("window_start"),
F.col("window.end").alias("window_end"),
expr("""
transform(
slice(sort_array(actions, false), 1, 10),
x -> concat(x.action, ':', x.product_id, ':', cast(x.event_ts as string))
)
""").alias("user_last_10_actions"),
F.col("distinct_products_viewed"),
)
)
The writer opens one psycopg3 connection per partition, mints a fresh OAuth token via the Databricks SDK, and upserts the windowed rows with INSERT ... ON CONFLICT DO UPDATE:
import psycopg
from databricks.sdk import WorkspaceClient
class LakebaseForeachWriter:
UPSERT_SQL = """
INSERT INTO public.user_feature_store
(user_id, window_start, window_end,
user_last_10_actions, distinct_products_viewed)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id, window_start, window_end)
DO UPDATE SET
user_last_10_actions = EXCLUDED.user_last_10_actions,
distinct_products_viewed = EXCLUDED.distinct_products_viewed,
updated_at = now();
"""
def __init__(self, endpoint_name, db_name, flush_size=500):
self.endpoint_name = endpoint_name
self.db_name = db_name
self.flush_size = flush_size
def open(self, partition_id, epoch_id):
w = WorkspaceClient()
endpoint = w.postgres.get_endpoint(name=self.endpoint_name)
cred = w.postgres.generate_database_credential(endpoint=self.endpoint_name)
self.conn = psycopg.connect(
host=endpoint.status.hosts.host,
port=5432,
dbname=self.db_name,
user=w.current_user.me().user_name,
password=cred.token,
sslmode="require",
)
self.buffer = []
return True
def process(self, row):
self.buffer.append((
row["user_id"],
row["window_start"],
row["window_end"],
list(row["user_last_10_actions"]),
int(row["distinct_products_viewed"]),
))
if len(self.buffer) >= self.flush_size:
self._flush()
def close(self, error):
if error is None and self.buffer:
self._flush()
self.conn.close()
def _flush(self):
with self.conn.cursor() as cur:
cur.executemany(self.UPSERT_SQL, self.buffer)
self.conn.commit()
self.buffer.clear()
(result.writeStream
.outputMode("update")
.foreach(LakebaseForeachWriter(ENDPOINT_NAME, DB_NAME))
.option("checkpointLocation", CHECKPOINT_LOCATION)
.trigger(processingTime="15 seconds")
.start())
A few notes on the writer:
One final architectural constraint: Training and serving must agree. When using Direct Stream, writes hit Lakebase directly and bypass the offline store, which means the streaming logic and training pipeline can drift. If the training math for "events in the last minute" deviates even slightly from the streaming job, the model encounters a different distribution in production than it did during development. You must pin the logic down in one place. For greater robustness, have your streaming writer append those same feature rows to a Unity Catalog Delta table. This ensures your training sets are composed of the exact values served online, rather than values re-derived after the fact. (This consistency is precisely what Declarative Window automates for SlidingWindow and TumblingWindow shapes; for custom shapes beyond those, you are responsible for maintaining that parity.)
A production recommender typically uses all three patterns simultaneously, assigning each signal to the pattern that best matches its freshness and governance requirements.
Figure 4. Hybrid recommender architecture: user-profile features via Managed Publish, item-popularity windowed aggregates via Declarative Window, session state via Direct Stream, training via the same Declarative Window features plus on-the-fly `RollingWindow` aggregates, all joined by a single Model Serving endpoint.
At inference time one Model Serving endpoint reads a FeatureSpec that joins the three online sources. A single request fans out into three parallel reads against Lakebase: one feature lookup on user_id, one on item_id, and one on user_id again for session state, followed by scoring and the response. The tail lookup dominates the latency budget; the rest is scoring and network.
Managed Publish carries the lowest operational cost, with minute-fresh as its ceiling. Declarative Window lets a windowed aggregate train and serve from a single definition, with the cron cadence setting the cost; RollingWindow handles the trailing aggregates that don't need an online copy. Direct Stream gets you seconds-fresh data and the most engineering work to own. Production recommenders run all three at once, and the design call for every feature is the same: weigh the cost of keeping it fresh against the cost of letting it lag, and place it accordingly.
One note before you start building: the native PostgreSQL Sink and Streaming Declarative Features on Kafka are both in Preview today. If you build with Direct Stream now, keep your Lakebase schemas and primary keys stable so the managed path can replace your code without touching the inference layer.
A snappy recommendation page, like showing running socks the moment someone adds shoes to their cart, happens because someone decided that "items clicked in this session" had to be seconds-fresh while "lifetime average order value" could sit a day stale, and then routed each signal accordingly. That routing decision, signal by signal, is where the work happens. If you skip that decision, you either pay to refresh things that don't need refreshing or ship a recommendation that lags behind the customer.
A 200-millisecond latency budget means you have to match each feature to the right lookup mechanism. There's no reason to update stable user profiles with the same frequency as sub-second session counters.
With Databricks Lakebase, you can choose from three online serving paths depending on your latency needs. Managed Publish handles minute-level updates. For consistent windowed aggregates, you can use Declarative Window. And when you need seconds-fresh operational state, Direct Stream writes directly to the serving layer.
The hard engineering work is the routing where every feature requires a deliberate choice: you have to weigh the cost of real-time processing against the performance impact of data lag. Getting this right lets you run models on active, real-time data instead of relying on static warehouse tables.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.