cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Why ETL Pipelines and Jobs

john77
New Contributor

I do notice that ETL Pipelines let's you run declarative SQL syntax such as DLT tables but you can do the same with Jobs if you use SQL as your task type. So why and when to use ETL Pipelines?

2 REPLIES 2

K_Anudeep
Databricks Employee
Databricks Employee

Hello @john77 ,

Lakeflow ETL Pipelines give you a managed, declarative engine that understands your tables/flows and runs them with automatic dependency resolution, retries, and incremental semantics. Jobs are the general-purpose orchestratorโ€”they can run SQL files (and many other task types), but they donโ€™t add the pipeline-smart behaviours by themselves

Running SQL via a SQL task executes statements, but doesnโ€™t add the features like (automatic DAG building from table dependencies, streaming, incremental MV refresh logic, AUTO CDC, taskโ†’flow retry semantics, etc.). Those come from Lakeflow Declarative Pipelines.

Use ETL Pipelines when you want

  • Automatic orchestration of a data DAG: Pipelines analyse dependencies between flows/tables and run them in the right order with max parallelismโ€”no hand-built task graphs. Databricks Documentation

  • Declarative, incremental processing: Write simple SQL/Python; the engine does incremental MV refreshes and streaming ingestion without you coding watermarking/checkpointing logic. Databricks Documentation

  • Native CDC & SCD: The AUTO CDC flow handles out-of-order events and supports SCD1/SCD2 with a few lines of code. Databricks Documentation

  • SQL-first ETL inside or outside pipelines: You can define streaming tables/materialised views directly in Databricks SQL

Use Jobs when you wantโ€ฆ

  • General workflow orchestration across any task type: notebooks, Python scripts, dbt, SQL files, REST calls, and even โ€œpipelineโ€ tasks. Itโ€™s the scheduler/automation layer for diverse workloads. Databricks Documentation

  • Run plain SQL files (queries, dashboards, alerts) against a SQL warehouse, with Git-versioned assetsโ€”useful for reports or one-off DDL/DML that doesnโ€™t need pipeline semantics.

 

Please let me know if you have any further questions

 

saurabh18cs
Honored Contributor II

Hi @john77 

SQL Task Type : Simple, one-off SQL operations or batch jobs + you need to orchestrate a mix of notebooks, Python/Scala code, and SQL in a single workflow

Lakeflow Declarative Pipelines : Complex , production ETL jobs requires lineage , monitoring, event logs, data quality rules, CDC, incremental processing  and optimize plans automatically

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now