Delivering Lakehouse insights into external operational applications has been a headache for low latency cases. This often required complex & brittle integrations. With Databricks Lakebase Synced Tables, a native solution for Reverse ETL bridges the gap between Lakehouse and low latency Postgres backed Lakebase.
Native Reverse ETL
Traditional Reverse ETL paradigm involved moving data from a warehouse to an operational database, often creating governance & management issues. Lakebase is a high performance OLTP database added into the Data Intelligence Platform with Native RETL support. This operational database is not a silo anymore as its a governed extension of Lakehouse, managed through Unity Catalog.
Synced Tables
Synced Table is a read only Postgres table in Lakebase Provisioned that syncs data from a Unity Catalog table in multiple modes. It is a managed projection of the analytical data designed for low latency application reads. These are read only & organizations shall maintain a single source of truth in the Lakehouse while serving the front end at low latency.
Sync Pipelines
The synchronization is powered by Lakeflow Spark Declarative Pipelines. When a synced table is created, Databricks provisions a managed pipeline that monitors the source table for changes. It eliminates the need for manual brittle pipelines to keep the external operational database in sync. The pipeline handles the orchestration and monitoring.
Sync Modes
Lakebase offers multiple sync modes to help various scenarios:
Snapshot: Ideal for batch oriented data, performing a full refresh on demand or schedule.
Triggered: A middle ground that processes incremental changes when the pipeline is executed.
Continuous: The gold for low latency apps, syncing changes rapidly to reflect the Lakehouse changes.
Organizations shall get full lineage from the ingestion point to the synced table & use the security model of Unity Catalog to ensure that only valid applications query the Lakebases. Lakebase databases can scale to handle heavy applications based on scenarios.
Activate the data in the Lakehouse using Lakebase RETL to create a Data Intelligence Platform to support operational apps at low latency.