cancel
Showing results for 
Search instead for 
Did you mean: 
Lakebase Blogs
Discover curated blogs from the community and experts. Learn through practical guides, use cases, and thought pieces designed to help you get more out of Lakebase.
cancel
Showing results for 
Search instead for 
Did you mean: 
uday_satapathy
Databricks Employee
Databricks Employee

Summary

  • Why systems catering to tactical, operational decision-making need to be different from traditional OLTP and OLAP
  • How a Lakebase-centric architecture for ODS solves this need
  • What business use-cases gain value from an ODS build on top of Lakebase

 

Ask a data engineer to draw the architecture of their stack, and you’ll often see the same two things: OLTP systems on the left, a data warehouse or lakehouse on the right. Transactions on the one hand, Analytics on the other.

What’s missing? The middle.

Operational teams, such as support agents, fraud analysts, ops managers, and field service coordinators, don’t need analytics on stale data. They need the current state of the world, right now, to act on. Sub-minute freshness. Fast queries. Data from multiple source systems integrated into a single operational view.

The Operational Data Store (ODS) was designed specifically for this. The concept is decades old, but the need has never gone away. Most modern stacks just don’t name it or build it cleanly  –  and they pay the price in fragile workarounds.

 Lakebase on Databricks is a natural fit for a modern ODS.

What an ODS is (and is not)

The book Building the Operational Data Store (W. H. Inmon, Claudia Imhoff, and Greg Battas) defines ODS as “a subject-oriented, integrated, current, volatile collection of data used to support tactical decision-making.”

The key tenets from this definition are:

  • Subject-oriented: organized around business topics and its related entities (customer, order, product, risk), not around source systems
  • Integrated: data from multiple operational systems, resolved into a unified view
  • Current: focused on the present state of those entities, not their long history
  • Volatile: data changes as the real world changes. The ODS reflects that
  • Tactical: serving operational decisions, not strategic analytics

An ODS is not

  • a data warehouse or lakehouse, which are suited for strategic analytics, history, and heavy computation
  • a staging area for ETL
  • a departmental reporting database
  • a catch-all “operational data dumping ground”

The ODS is a current-state data product layer. It’s an operational integration store that sits between transactional systems and the analytical layer.

Screenshot 2026-03-26 at 10.27.19 AM.png

 Figure 1: ODS Positioning

Which layer answers which question?

A practical routing test: if the question is about the current state of a specific entity and the answer drives an immediate action, it belongs in the ODS. If it’s about trends, aggregates, or patterns across time or population, it belongs in the lakehouse.

 

Ask the ODS (Lakebase)

Ask the Data Warehouse / Lakehouse

Operational  –  current state

The operations command center needs the current fulfillment status of order #12345 across order, payment, and shipment systems.

Supply chain leadership dashboard: What was our order fulfillment rate last quarter?

The fraud console shows that the customer’s account is currently flagged for fraud.

Risk analytics team: Which fraud patterns are most prevalent this year?

The support dashboard looks for open support cases for customer X.

Customer success leadership quarterly review: How do support resolution times trend by customer tier?

Financial workflows need to know a customer’s current risk score before extending credit.

Fraud data science team: How has fraud model accuracy changed month-over-month?

Operational  –  inventory and field

The store manager wonders what is the current sellable inventory for product Y at store X (on-hand, reserved, and in-transit).

Merchandising Director: Which products have the highest stockout frequency by region?

The customer support rep needs to know what technicians have worked on the work order Z in the last week.

Service delivery dashboard: Which regions have the highest SLA breach rates?

Cross-system integration

Daily, the account manager checks the combined view of her customers across CRM, billing, and support.

Growth analytics team: How does retention differ between customers active in both CRM and ecommerce?

While resolving a payment issue, the support escalation team looks for a matching payment record in the payments system against a sales order.

FinOps Dashboard: What percentage of orders have mismatched payment records by channel over the last year?

The relationship manager needs a consolidated account status across all product lines for a customer.

Product marketing monthly review: Which product line combinations have the highest cross-sell success rate?

 

The problem with “ODS by accident”

Most organizations have an ODS need, though few build it intentionally. Instead, they end up with a collection of fragile workarounds:

  • dashboards querying production OLTP directly
  • ad-hoc Redis caches, one per app
  • one-off sync jobs cobbled together per team
  • warehouse tables being used as pseudo-operational databases
  • point-to-point pipelines multiplying over time

The result is operational pain that compounds. Examples could be: Inconsistent data definitions across apps (because each team defines “customer status” differently). Duplicated ETL logic (same joins written in five different places). Production OLTP systems are under load from reporting queries because there is no single authoritative current-state view.

You end up with the ODS pattern anyway, just without naming it, designing it, or governing it.

uday_satapathy_1-1774357895689.png

Figure 2: ODS Anti-Pattern

 

Why Lakebase fits the ODS pattern

Lakebase is a managed Postgres service inside the Databricks platform. That combination of capabilities maps cleanly onto ODS requirements:

ODS Need

Lakebase Capability

Low-latency operational queries

Standard Postgres interface, millisecond response

Current-state entity storage

Upserts, transactional writes, ACID semantics

Relational integration across entities

Relational joins across Postgres tables

Operational app connectivity

Standard Postgres drivers – no special client needed

CDC/streaming ingestion

Kafka/Debezium events flow through Streaming ETL (Structured Streaming or Lakeflow Spark Declarative Pipelines) into the Lakehouse (or to Lakebase directly), then sync into Lakebase via Synced Tables

Lakehouse integration

Native syncing from Unity Catalog via Synced Tables

Safe schema evolution

Branching lets you test schema changes against a copy of the data before applying to production

The branching capability deserves emphasis. In a live operational system, schema changes are risky. Lakebase branches let you fork the database, test a migration, and promote it confidently – without any downtime to the operational apps reading the main branch.

Reference architecture

Here’s what a well-designed Lakebase ODS architecture looks like end-to-end:

  1. Source operational systems (ERP, e-commerce, CRM, ticketing, payment processors) continuously produce change events.
  2. Ingestion layer captures data from the source systems in four ways:  
    1. Lakeflow Connect routes snapshots and change events directly into the lakehouse
    2. Change Data Capture via Debezium, or similar tools → Kafka (or similar message brokers) → Lakehouse. This can be done using Structured Streaming or Lakeflow Declarative Pipelines which have Unity Catalog tables as sinks.
    3. Change Data Capture via Debezium, or similar tools → Kafka (or similar message brokers) → Lakebase. This pattern has the advantage of routing the data directly to Lakebase with minimal latency – owing to a lesser number of hops. You can use Structured Streaming or Lakeflow Spark Declarative Pipelines to implement it. As a best practice, you should be aware of what delivery guarantees (retries, idempotency, exactly once delivery, at-least-once delivery etc.) your streaming-sink provides. Add-in custom logic to build those guarantees whenever needed.
    4. Zerobus (Databricks’ high-throughput event bus) routes operational data directly into the lakehouse with millisecond latency
  3. Lakehouse (Delta / Iceberg, bronze/silver/gold) receives the full history:
    1. bronze: raw change events and source extracts
    2. silver: cleaned, standardized, entity-resolved records
    3. gold: computed outputs – risk scores, customer segments, fulfillment SLA status, feature outputs
  4. Synced Tables pull selected Unity Catalog tables from the gold layer into Lakebase as read-only Postgres tables. No hand-built reverse ETL. Managed, governed, and automatically refreshed.
  5. Lakebase ODS, through Postgres joins at query time, combines:  
    1. synced curated data (read-only, lakehouse-computed)
    2. native operational tables (writable: workflow state, cases, actions, approvals) 
  6. Operational consumers (support UIs, fraud triage tools, ops portals, dashboards, APIs) query Lakebase directly via standard Postgres drivers.
  7. Data aging  –  as records age out of their operational relevance window, Lakehouse Sync can move them from Lakebase back into the Lakehouse for long-term retention and historical analysis. App data and Operational Outcomes written into Lakebase may also be routed to the Lakehouse for analytics. This closes the loop: the lakehouse feeds the ODS with curated intelligence, and the ODS returns aged operational history back to the lakehouse. This is especially important for point 2.3. where Operational data is directly being written into Lakebase. Eventually, such datasets will become too big for “tactical usage” purposes and would need to be offloaded or purged.

The lakehouse retains the full history and remains the system for heavy analytical computation. Lakebase serves the operational ‘present’.

Screenshot 2026-03-26 at 10.25.22 AM.png

Figure 3: Reference Architecture

Synced Tables: the missing ODS primitive

The Synced Tables feature is what makes Lakebase particularly compelling for ODS design. It lets you create a Postgres table in Lakebase that automatically mirrors a Unity Catalog table. No pipelines to maintain, no reverse ETL logic to write.

From an ODS perspective, this is the key primitive because it creates a clean, governed split:

  • Unity Catalog = curated, computed, analytical-grade data (the authoritative source)
  • Lakebase ODS = low-latency serving layer (reads, operational joins, app writes)

Some important things to know about Synced tables: 

  • Synced tables are expected to be maintained as Read-only in Postgres to preserve source integrity. 
  • Apps should not be allowed to accidentally mutate curated data. 
  • The data is automatically synchronized  –  degree of freshness is managed through simple knobs
  • Synced tables are queryable via standard Postgres. No special interface is needed for app developers.
  • Synced tables, inside Postgres, follow the grants and permissions assigned to Postgres roles.  However, the roles themselves originate from Unity Catalog. UC allows synced tables to be governed centrally via lineage, access control, and lifecycle management.

The ODS superpower is query-time joins between synced curated data and native writable operational tables.

  • synced customer_current + native support_cases
  • synced risk_scores + native fraud_case_queue
  • synced product_catalog + native order_orchestration_state

Logic computed once in the lakehouse surfaces instantly at query time in the operational context, without duplicating that logic in every app.

Another useful pattern is to bring in two synced tables from two different OLTP domains and join them in Lakebase at query time. 

 

uday_satapathy_3-1774357895688.png

Figure 4: Synced Tables Zoom-In

ODS design patterns in Lakebase

Current-state tables (synced from lakehouse)

These are the read-only backbone of the ODS. Synced from gold-layer Unity Catalog tables:

  • customer_current: integrated customer profile, latest segment, tier, risk flag
  • order_current: current order status, fulfillment stage, SLA status
  • inventory_position: current availability and stockout risk by SKU/location
  • shipment_status_latest: latest carrier event and estimated delivery

Short history tables (optional, in native Lakebase, or synced from lakehouse)

For operational context that needs recent – but not full – history:

  • order_status_history_30d: status transitions for the last 30 days (enough for ops, without the warehouse bulk)
  • risk_event_history_7d: recent risk signals for active fraud investigation

Keep these scoped. Long historical analysis belongs in the lakehouse.

Native operational tables (writable in Lakebase)

These are app-owned:

  • cases, tasks, escalations: workflow state
  • approvals, assignments: action tracking
  • analyst_notes, dispositions: human decisions and annotations

Idempotent upserts

Writes to native tables should be idempotent so retries are safe. Use INSERT ... ON CONFLICT DO UPDATE (Postgres upsert / merge) with a stable primary key.

Soft deletes and tombstones

Rather than hard deleting rows, use a deleted_at timestamp or is_active flag. This keeps audit trails intact for operational workflows.

Freshness columns

Make freshness explicit and queryable. Add to every synced table a column to identify when the source record was committed in the upstream systems. Also have a mechanism to locate when Synced Tables last refreshed this row into Lakebase.

Apps and dashboards can surface staleness warnings when data is stale.

Operational indexes

Index for your operational query patterns – not analytical scan patterns:  customer_id lookups (support cockpit) - case_status + assigned_to (work queue filtering) - risk_score + flagged_at (fraud triage ordering).

Don’t copy analytical index strategies. Tune for the app.

ODS freshness tiers

Different parts of your ODS will have different freshness requirements. Inspired by the ODS class framework from Corporate Information Factory by W. H. Inmon, Claudia Imhoff, and Ryan Sousa, here’s how those tiers map to Lakebase:

Freshness tier

Latency

Pattern

Lakebase fit

Real-time

Seconds or lower

Near-synchronous with source. Minimal transformation

Zerobus into lakehouse + Continuous Synced Tables, or Streaming writes into Lakebase.

Near-real-time

Minutes

Store-and-forward. Meaningful integration possible

Scheduled/triggered Synced Tables

Batch

Hours / Daily

Overnight processing. Rich integration and transformation

Scheduled Synced Tables. Daily pipeline outputs

Feedback

Irregular / Event-driven

Lakehouse-derived scores and segments pushed back to ODS

Gold-layer Unity Catalog tables synced into Lakebase

Lakebase is especially strong for real-time and near-real-time serving (where freshness is critical) and for the feedback tier (where the lakehouse enriches the ODS with computed intelligence).

Most real-world implementations span multiple tiers: fraud signals at real-time, support enrichment at near-real-time, reference dimensions at batch.

What not to do (keeping your ODS an ODS)

The ODS is easy to abuse once teams realize it’s fast and queryable. Avoid:

  • Dumping all historical data into it. Long-term history and large analytical scans belong in the lakehouse.
  • Using it for ad hoc BI exploration. It’s not a warehouse. Optimize it for operational patterns.
  • Storing giant denormalized fact tables. They belong in the analytical layer.
  • Letting it become a catch-all staging area. Every table in the ODS should have a named operational consumer.

With Lakebase specifically: 

  • sync only what is needed for tactical serving (not entire source tables). 
  • Be explicit about freshness SLAs per table
  • keep analytical workloads on the lakehouse side of the Synced Tables boundary

Example use cases

Customer support cockpit

  • Synced from lakehouse: customer profile, subscription status, entitlements, risk flags, and recent purchase history summary
  • Native Lakebase: support cases, assignments, escalation state, SLA clock
  • Outcome: a fast agent UI with complete current customer context, without hitting OLTP or running lakehouse queries at interaction time

Fraud triage

  • Synced from lakehouse: latest risk scores, model-derived signals, event-pattern flags
  • Native Lakebase: fraud case queue, analyst notes, disposition workflow, hold/release decisions
  • Outcome: an immediate action loop where analysts work on top of curated ML outputs, with their decisions captured operationally

Retail operations

  • Synced from lakehouse: product availability, predicted stockout risk, fulfillment metrics, supplier lead times
  • Native Lakebase: store-level actions, manager overrides, replenishment workflow, audit log
  • Outcome: a tactical decision layer for store and regional ops teams with real-time inventory context and captured action history

Field service coordination

  • Synced from lakehouse: current work orders, technician skill profiles, SLA breach risk scores, parts availability
  • Native Lakebase: technician assignments, dispatch state, in-progress updates, completion records
  • Outcome: a live dispatch and coordination layer where the operational workflow state and the analytical risk picture are always in the same query

Closing

The ODS never went away. We just stopped drawing it on architecture diagrams – and replaced it with a tangle of point-to-point pipelines, ad hoc caches, and abused warehouse tables.

Lakebase makes it straightforward to build this layer cleanly inside the Databricks ecosystem, transforming the future architecture to:

  • Lakehouse for compute, history, and analytical intelligence
  • Lakebase for low-latency, current-state operational serving
  • Synced Tables as the governed bridge between the two

That’s a much better pattern than forcing your warehouse to act like an app database or letting operational dashboards hammer production OLTP.

If your team is wrestling with operational analytics, support tooling, fraud workflows, or real-time ops visibility, you don’t need a new concept. You need the concept the industry already figured out, rebuilt cleanly on a modern stack.

Build the ODS. Use Lakebase.

Contributors