<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why ETL Pipelines and Jobs in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133689#M49901</link>
    <description>&lt;P&gt;Basically the pipelines can be seen as a replacement of notebooks.&amp;nbsp; You define all your logic in dlt functions and call these.&lt;BR /&gt;You get a nice lineage view etc.&amp;nbsp; But it comes with a price: less flexibility and you are pushed into a rather strict way of working.&lt;/P&gt;&lt;P&gt;It is definitely not for everybody, especially if you already have&amp;nbsp; databricks for years.&lt;/P&gt;&lt;P&gt;Perhaps Databricks should make an actual technical blog without marketing about what is really possible and what is not possible.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 03 Oct 2025 12:51:28 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2025-10-03T12:51:28Z</dc:date>
    <item>
      <title>Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133410#M49836</link>
      <description>&lt;P&gt;I do notice that ETL Pipelines let's you run declarative SQL syntax such as DLT tables but you can do the same with Jobs if you use SQL as your task type. So why and when to use ETL Pipelines?&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2025 22:15:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133410#M49836</guid>
      <dc:creator>john77</dc:creator>
      <dc:date>2025-09-30T22:15:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133418#M49838</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/187826"&gt;@john77&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Lakeflow ETL Pipelines give you a managed, declarative engine that understands your tables/flows and runs them with automatic dependency resolution, retries, and incremental semantics. Jobs are the general-purpose orchestrator—they can run SQL files (and many other task types), but they don’t add the pipeline-smart behaviours by themselves&lt;/P&gt;
&lt;P&gt;Running SQL via a &lt;STRONG data-start="2362" data-end="2374"&gt;SQL task&lt;/STRONG&gt; executes statements, but &lt;STRONG data-start="2400" data-end="2415"&gt;doesn’t add&lt;/STRONG&gt; the features like (automatic DAG building from table dependencies, streaming, incremental MV refresh logic, AUTO CDC, task→flow retry semantics, etc.). Those &lt;EM data-start="2574" data-end="2585"&gt;come from&lt;/EM&gt; Lakeflow Declarative Pipelines.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H3 data-start="432" data-end="472"&gt;Use &lt;STRONG data-start="440" data-end="457"&gt;ETL Pipelines&lt;/STRONG&gt; when you want&lt;/H3&gt;
&lt;UL data-start="473" data-end="1603"&gt;
&lt;LI data-start="473" data-end="773"&gt;
&lt;P data-start="475" data-end="773"&gt;&lt;STRONG data-start="475" data-end="516"&gt;Automatic orchestration of a data DAG&lt;/STRONG&gt;: Pipelines analyse dependencies between flows/tables and run them in the right order with max parallelism—no hand-built task graphs. &lt;SPAN class="" data-state="closed"&gt;&lt;SPAN class="ms-1 inline-flex max-w-full items-center relative top-[-0.094rem] animate-[show_150ms_ease-in]" data-testid="webpage-citation-pill"&gt;&lt;A class="flex h-4.5 overflow-hidden rounded-xl px-2 text-[9px] font-medium transition-colors duration-150 ease-in-out text-token-text-secondary! bg-[#F4F4F4]! dark:bg-[#303030]!" href="https://docs.databricks.com/aws/en/dlt/concepts" target="_blank" rel="noopener"&gt;&lt;SPAN class="relative start-0 bottom-0 flex h-full w-full items-center"&gt;&lt;SPAN class="flex h-4 w-full items-center justify-between overflow-hidden"&gt;&lt;SPAN class="max-w-[15ch] grow truncate overflow-hidden text-center"&gt;Databricks Documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="774" data-end="999"&gt;
&lt;P data-start="776" data-end="999"&gt;&lt;STRONG data-start="776" data-end="815"&gt;Declarative, incremental processing&lt;/STRONG&gt;: Write simple SQL/Python; the engine does incremental MV refreshes and streaming ingestion without you coding watermarking/checkpointing logic. &lt;SPAN class="" data-state="closed"&gt;&lt;SPAN class="ms-1 inline-flex max-w-full items-center relative top-[-0.094rem] animate-[show_150ms_ease-in]" data-testid="webpage-citation-pill"&gt;&lt;A class="flex h-4.5 overflow-hidden rounded-xl px-2 text-[9px] font-medium transition-colors duration-150 ease-in-out text-token-text-secondary! bg-[#F4F4F4]! dark:bg-[#303030]!" href="https://docs.databricks.com/aws/en/dlt/concepts" target="_blank" rel="noopener"&gt;&lt;SPAN class="relative start-0 bottom-0 flex h-full w-full items-center"&gt;&lt;SPAN class="flex h-4 w-full items-center justify-between overflow-hidden"&gt;&lt;SPAN class="max-w-[15ch] grow truncate overflow-hidden text-center"&gt;Databricks Documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="1000" data-end="1162"&gt;
&lt;P data-start="1002" data-end="1162"&gt;&lt;STRONG data-start="1002" data-end="1022"&gt;Native CDC &amp;amp; SCD&lt;/STRONG&gt;: The &lt;STRONG data-start="1028" data-end="1040"&gt;AUTO CDC&lt;/STRONG&gt; flow handles out-of-order events and supports SCD1/SCD2 with a few lines of code. &lt;SPAN class="" data-state="closed"&gt;&lt;SPAN class="ms-1 inline-flex max-w-full items-center relative top-[-0.094rem] animate-[show_150ms_ease-in]" data-testid="webpage-citation-pill"&gt;&lt;A class="flex h-4.5 overflow-hidden rounded-xl px-2 text-[9px] font-medium transition-colors duration-150 ease-in-out text-token-text-secondary! bg-[#F4F4F4]! dark:bg-[#303030]!" href="https://docs.databricks.com/aws/en/dlt/concepts" target="_blank" rel="noopener"&gt;&lt;SPAN class="relative start-0 bottom-0 flex h-full w-full items-center"&gt;&lt;SPAN class="flex h-4 w-full items-center justify-between overflow-hidden"&gt;&lt;SPAN class="max-w-[15ch] grow truncate overflow-hidden text-center"&gt;Databricks Documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="1000" data-end="1162"&gt;&lt;SPAN class="" data-state="closed"&gt;&lt;SPAN class="ms-1 inline-flex max-w-full items-center relative top-[-0.094rem] animate-[show_150ms_ease-in]" data-testid="webpage-citation-pill"&gt;&lt;SPAN class="relative start-0 bottom-0 flex h-full w-full items-center"&gt;&lt;SPAN class="flex h-4 w-full items-center justify-between overflow-hidden"&gt;&lt;SPAN class="max-w-[15ch] grow truncate overflow-hidden text-center"&gt;&lt;STRONG data-start="1371" data-end="1416"&gt;SQL-first ETL inside or outside pipelines&lt;/STRONG&gt;: You can define streaming tables/materialised views directly in Databricks SQL&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-start="1605" data-end="1636"&gt;Use &lt;STRONG data-start="1613" data-end="1621"&gt;Jobs&lt;/STRONG&gt; when you want…&lt;/H3&gt;
&lt;UL data-start="1637" data-end="2274"&gt;
&lt;LI data-start="1637" data-end="1882"&gt;
&lt;P data-start="1639" data-end="1882"&gt;&lt;STRONG data-start="1639" data-end="1673"&gt;General workflow orchestration&lt;/STRONG&gt; across &lt;EM data-start="1681" data-end="1686"&gt;any&lt;/EM&gt; task type: notebooks, Python scripts, dbt, SQL files, REST calls, &lt;STRONG data-start="1753" data-end="1760"&gt;and&lt;/STRONG&gt; even “pipeline” tasks. It’s the scheduler/automation layer for diverse workloads. &lt;SPAN class="" data-state="closed"&gt;&lt;SPAN class="ms-1 inline-flex max-w-full items-center relative top-[-0.094rem] animate-[show_150ms_ease-in]" data-testid="webpage-citation-pill"&gt;&lt;A class="flex h-4.5 overflow-hidden rounded-xl px-2 text-[9px] font-medium transition-colors duration-150 ease-in-out text-token-text-secondary! bg-[#F4F4F4]! dark:bg-[#303030]!" href="https://docs.databricks.com/aws/en/jobs/" target="_blank" rel="noopener"&gt;&lt;SPAN class="relative start-0 bottom-0 flex h-full w-full items-center"&gt;&lt;SPAN class="flex h-4 w-full items-center justify-between overflow-hidden"&gt;&lt;SPAN class="max-w-[15ch] grow truncate overflow-hidden text-center"&gt;Databricks Documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="2054" data-end="2274"&gt;
&lt;P data-start="2056" data-end="2274"&gt;&lt;STRONG data-start="2056" data-end="2079"&gt;Run plain SQL files&lt;/STRONG&gt; (queries, dashboards, alerts) against a SQL warehouse, with Git-versioned assets—useful for reports or one-off DDL/DML that doesn’t need pipeline semantics.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please let me know if you have any further questions&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Oct 2025 08:53:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133418#M49838</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-01T08:53:45Z</dc:date>
    </item>
    <item>
      <title>Re: Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133423#M49841</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/187826"&gt;@john77&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;SQL Task Type : Simple, one-off SQL operations&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;or batch jobs +&amp;nbsp;you need to orchestrate a mix of notebooks, Python/Scala code, and SQL in a single workflow&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Lakeflow Declarative Pipelines : Complex , production ETL jobs requires lineage ,&lt;/STRONG&gt; monitoring, event logs, data quality rules, CDC, incremental processing&amp;nbsp; and optimize plans automatically&lt;/P&gt;</description>
      <pubDate>Wed, 01 Oct 2025 09:28:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133423#M49841</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-10-01T09:28:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133594#M49889</link>
      <description>&lt;P&gt;Sounds fair, but that being said I do not think it restricts us from creating a DLT code as a SQL task in a Job. Once the job is evaluated/dryrun/run (which has a SQL task with DLT definition) , I immediately see an automatically created pipeline in the UI. (Streaming tables&amp;nbsp;created in&amp;nbsp;Databricks SQL&amp;nbsp;have a type of&amp;nbsp;MV/ST.)&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;So don't I get the benefits you outlined in this case too but without the efforts of creating a ETL pipeline ? Which are the benefits I miss?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2025 05:03:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133594#M49889</guid>
      <dc:creator>john77</dc:creator>
      <dc:date>2025-10-03T05:03:46Z</dc:date>
    </item>
    <item>
      <title>Re: Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133620#M49893</link>
      <description>&lt;P&gt;&lt;STRONG data-start="21" data-end="48"&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/187826"&gt;@john77&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;When you have a&lt;STRONG data-start="21" data-end="48"&gt; SQL task creating ST/MV&lt;/STRONG&gt;&amp;nbsp;,it works fine for a few independent tables. You &lt;EM data-start="99" data-end="103"&gt;do&lt;/EM&gt; get incremental refresh, retries, and an auto-created (implicit) pipeline per object&lt;BR /&gt;But what you miss is that ,it is&amp;nbsp; Harder to mix SQL + Python or add things like AUTO CDC and expectations across the &lt;EM data-start="1377" data-end="1384"&gt;whole&lt;/EM&gt; pipeline.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;No single DAG that guarantees bronze → silver → gold(medallion) ordering as &lt;/SPAN&gt;&lt;EM style="font-family: inherit;" data-start="1056" data-end="1061"&gt;one&lt;/EM&gt;&lt;SPAN&gt; unit and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;object has its own run/metrics; no one “pipeline run” view.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;All the above are only possible with ETL pipline&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2025 06:38:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133620#M49893</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-03T06:38:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why ETL Pipelines and Jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133689#M49901</link>
      <description>&lt;P&gt;Basically the pipelines can be seen as a replacement of notebooks.&amp;nbsp; You define all your logic in dlt functions and call these.&lt;BR /&gt;You get a nice lineage view etc.&amp;nbsp; But it comes with a price: less flexibility and you are pushed into a rather strict way of working.&lt;/P&gt;&lt;P&gt;It is definitely not for everybody, especially if you already have&amp;nbsp; databricks for years.&lt;/P&gt;&lt;P&gt;Perhaps Databricks should make an actual technical blog without marketing about what is really possible and what is not possible.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Oct 2025 12:51:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-etl-pipelines-and-jobs/m-p/133689#M49901</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-10-03T12:51:28Z</dc:date>
    </item>
  </channel>
</rss>

