Hello @john77 ,
Lakeflow ETL Pipelines give you a managed, declarative engine that understands your tables/flows and runs them with automatic dependency resolution, retries, and incremental semantics. Jobs are the general-purpose orchestratorโthey can run SQL files (and many other task types), but they donโt add the pipeline-smart behaviours by themselves
Running SQL via a SQL task executes statements, but doesnโt add the features like (automatic DAG building from table dependencies, streaming, incremental MV refresh logic, AUTO CDC, taskโflow retry semantics, etc.). Those come from Lakeflow Declarative Pipelines.
Use ETL Pipelines when you want
-
Automatic orchestration of a data DAG: Pipelines analyse dependencies between flows/tables and run them in the right order with max parallelismโno hand-built task graphs. Databricks Documentation
-
Declarative, incremental processing: Write simple SQL/Python; the engine does incremental MV refreshes and streaming ingestion without you coding watermarking/checkpointing logic. Databricks Documentation
-
Native CDC & SCD: The AUTO CDC flow handles out-of-order events and supports SCD1/SCD2 with a few lines of code. Databricks Documentation
- SQL-first ETL inside or outside pipelines: You can define streaming tables/materialised views directly in Databricks SQL
Use Jobs when you wantโฆ
-
General workflow orchestration across any task type: notebooks, Python scripts, dbt, SQL files, REST calls, and even โpipelineโ tasks. Itโs the scheduler/automation layer for diverse workloads. Databricks Documentation
-
Run plain SQL files (queries, dashboards, alerts) against a SQL warehouse, with Git-versioned assetsโuseful for reports or one-off DDL/DML that doesnโt need pipeline semantics.
Please let me know if you have any further questions