Introduction to Lakeflow
At the Databricks Data + AI Summit 2025, Databricks unveiled Lakeflow, a revolutionary approach to data engineering. While many of us have used Delta Live Tables (DLT) for declarative pipeline management, Lakeflow goes beyond, offering a completely unified framework for batch, streaming, orchestration, and ingestion in one cohesive experience.
Lakeflow is built to be the backbone of reliable, scalable, and intelligent data movement across the Databricks Lakehouse. It's declarative, visual, powerful and optimised for both engineers and analyst alike.
Why Lakeflow Matters
In today’s fast-paced data landscape, organisations struggle with
- Connecting multiple ingestion tools for batch and real-time data
- Maintaining pipeline logic across environments and teams
- Orchestrating pipelines with external schedulers
- Bridging the gap between data engineering and business consumption
Key Features of Lakeflow
Here are the flagship capabilities that make Lakeflow a powerful tool
- Lakeflow Connect
- A managed data ingestion engine for batch, streaming, CDC, and file-based sources
- Works with sources like Kafka, Event Hubs, Databases, Object Storage
- Fully governed via Unity Catalog
- Declarative Pipelines
- Build pipelines using SQL or Python with a declarative approach (like DLT)
- Auto-manages state, lineage, and error handling
- Highly optimised for reliability and scaling
Lakeflow’s declarative pipeline engine is the open-source evolution of Delta Live Tables now contributed to Apache Spark
- Lakeflow Designer
- A drag-and-drop visual ETL builder
- Ideal for analysts or less technical users
- Enables rapid pipeline prototyping and collaboration across teams
- Jobs Orchestration
- A native, scalable workflow orchestrator and no need for Airflow, Azure Data Factory, or external schedulers
- Supports dependencies, parameterisation, and notifications
- Orchestrate pipelines, notebooks, AI workflows, and apps
Where Is Lakeflow Useful?
Lakeflow fits perfectly into any stage of the modern data pipeline, particularly when
- We need to ingest data from heterogeneous sources into a Lakehouse
- We building incremental pipelines that need to run on schedules or triggers
- We want governance + transformation + lineage in one system
- We aim to democratize pipeline creation via Lakeflow Designer for your data analysts
- We are modernising legacy ETL tools like Informatica, SSIS
It’s built for enterprise-grade performance, developer productivity, and AI-readiness making it future-proof for the GenAI era.
Lakeflow vs Delta Live Tables (DLT) What’s Different?
Feature | Delta Live Tables (DLT) | Lakeflow |
Pipeline Type | Batch + Streaming | Batch, Streaming, CDC |
Source Ingestion | Manual or external | Built-in with Lakeflow Connect |
UI Experience | Code-first only | Visual UI via Lakeflow Designer |
Orchestration | Requires Jobs or Workflows | Native orchestration included |
Open Source | Closed | Declarative engine contributed to Apache Spark |
Audience | Data Engineers | Engineers + Analysts + ML teams |
Think of Lakeflow as DLT++ - not just an upgrade, but a platform expansion that unifies ingestion, transformation, and orchestration under one umbrella.
Final Thoughts
With Lakeflow, Databricks has set a new standard for data engineering. It’s no longer about cobbling tools from different vendors. Instead, Lakeflow brings ingestion, pipeline design, scheduling, observability, and governance into a single, AI-native platform.
If you are already using Delta Live Tables - great. But it’s time to explore Lakeflow, especially in
- Scaling pipelines across teams
- Need real-time ingestion at low latency
- Want less code, more productivity