Databricks Community

RahulGupta · ‎07-26-2025

Introduction to Lakeflow

At the Databricks Data + AI Summit 2025, Databricks unveiled Lakeflow, a revolutionary approach to data engineering. While many of us have used Delta Live Tables (DLT) for declarative pipeline management, Lakeflow goes beyond, offering a completely unified framework for batch, streaming, orchestration, and ingestion in one cohesive experience.

Lakeflow is built to be the backbone of reliable, scalable, and intelligent data movement across the Databricks Lakehouse. It's declarative, visual, powerful and optimised for both engineers and analyst alike.

Why Lakeflow Matters

In today’s fast-paced data landscape, organisations struggle with

Connecting multiple ingestion tools for batch and real-time data
Maintaining pipeline logic across environments and teams
Orchestrating pipelines with external schedulers
Bridging the gap between data engineering and business consumption

Key Features of Lakeflow

Here are the flagship capabilities that make Lakeflow a powerful tool

Lakeflow Connect

A managed data ingestion engine for batch, streaming, CDC, and file-based sources
Works with sources like Kafka, Event Hubs, Databases, Object Storage
Fully governed via Unity Catalog

Declarative Pipelines

Build pipelines using SQL or Python with a declarative approach (like DLT)
Auto-manages state, lineage, and error handling
Highly optimised for reliability and scaling

Lakeflow’s declarative pipeline engine is the open-source evolution of Delta Live Tables now contributed to Apache Spark

Lakeflow Designer

A drag-and-drop visual ETL builder
Ideal for analysts or less technical users
Enables rapid pipeline prototyping and collaboration across teams

Jobs Orchestration

A native, scalable workflow orchestrator and no need for Airflow, Azure Data Factory, or external schedulers
Supports dependencies, parameterisation, and notifications
Orchestrate pipelines, notebooks, AI workflows, and apps

Where Is Lakeflow Useful?

Lakeflow fits perfectly into any stage of the modern data pipeline, particularly when

We need to ingest data from heterogeneous sources into a Lakehouse
We building incremental pipelines that need to run on schedules or triggers
We want governance + transformation + lineage in one system
We aim to democratize pipeline creation via Lakeflow Designer for your data analysts
We are modernising legacy ETL tools like Informatica, SSIS

It’s built for enterprise-grade performance, developer productivity, and AI-readiness making it future-proof for the GenAI era.

Lakeflow vs Delta Live Tables (DLT) What’s Different?

Feature	Delta Live Tables (DLT)	Lakeflow
Pipeline Type	Batch + Streaming	Batch, Streaming, CDC
Source Ingestion	Manual or external	Built-in with Lakeflow Connect
UI Experience	Code-first only	Visual UI via Lakeflow Designer
Orchestration	Requires Jobs or Workflows	Native orchestration included
Open Source	Closed	Declarative engine contributed to Apache Spark
Audience	Data Engineers	Engineers + Analysts + ML teams

Think of Lakeflow as DLT++ - not just an upgrade, but a platform expansion that unifies ingestion, transformation, and orchestration under one umbrella.

Final Thoughts

With Lakeflow, Databricks has set a new standard for data engineering. It’s no longer about cobbling tools from different vendors. Instead, Lakeflow brings ingestion, pipeline design, scheduling, observability, and governance into a single, AI-native platform.

If you are already using Delta Live Tables - great. But it’s time to explore Lakeflow, especially in

Scaling pipelines across teams
Need real-time ingestion at low latency
Want less code, more productivity

Databricks Community

Databricks Lakeflow - Redefining Data Engineering for the Modern AI Stack

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples