<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks with Airflow in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126596#M47820</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176875"&gt;@sandelic&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;If you workload is mainly Databricks-centered then stick to workflows. They are easy to manage and worfklows directly integrate with Databricks notebooks and jobs.&lt;BR /&gt;But sometimes your workload requires complex orchestration and scheduling between many different systems and Airflow was exactly made for this. Airflow allows for extensive customization, you can author and schedule workflows programatically in Python (you can do something similar with DAB, but&amp;nbsp; Airflow has more options), s&lt;SPAN&gt;upports a wide range of integrations with different systems, including cloud platforms, databases, and more.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would say, if you’re running primarily Spark-based workflows, Databricks Workflows are a great choice. However, if your data pipelines involve several different systems working together, Airflow is probably a better fit for your needs. It has steeper learning curve though.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 27 Jul 2025 20:28:20 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-07-27T20:28:20Z</dc:date>
    <item>
      <title>Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126593#M47819</link>
      <description>&lt;P&gt;Hi there,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to understand the advantages of using Airflow operators to orchestrate Databricks notebooks, given that Databricks already offers its own workflow solution. Could someone please explain the benefits?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Stefan&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jul 2025 19:18:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126593#M47819</guid>
      <dc:creator>sandelic</dc:creator>
      <dc:date>2025-07-27T19:18:39Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126596#M47820</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176875"&gt;@sandelic&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;If you workload is mainly Databricks-centered then stick to workflows. They are easy to manage and worfklows directly integrate with Databricks notebooks and jobs.&lt;BR /&gt;But sometimes your workload requires complex orchestration and scheduling between many different systems and Airflow was exactly made for this. Airflow allows for extensive customization, you can author and schedule workflows programatically in Python (you can do something similar with DAB, but&amp;nbsp; Airflow has more options), s&lt;SPAN&gt;upports a wide range of integrations with different systems, including cloud platforms, databases, and more.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would say, if you’re running primarily Spark-based workflows, Databricks Workflows are a great choice. However, if your data pipelines involve several different systems working together, Airflow is probably a better fit for your needs. It has steeper learning curve though.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jul 2025 20:28:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126596#M47820</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-27T20:28:20Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126598#M47821</link>
      <description>&lt;P&gt;Thanks for clarifying&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;. Could you elaborate on '&lt;SPAN&gt;several different systems working together&lt;/SPAN&gt;'? Specifically, does this imply that Airflow is recommended when other tools, like dbt-core, are already in use (for instance, if Databricks workflows integrate with dbt-cloud) ?&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jul 2025 21:32:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126598#M47821</guid>
      <dc:creator>sandelic</dc:creator>
      <dc:date>2025-07-27T21:32:03Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126599#M47822</link>
      <description>&lt;P&gt;Sure, in many real-world data pipelines, you don’t just process data in one tool like Databricks — instead, you're interacting with a variety of systems at different stages of the pipeline. So, let's say that your workload requires orchestrating following things:&lt;/P&gt;&lt;P&gt;1. S3 File Upload → (AWS S3 Sensor)&lt;BR /&gt;2. Load File into Snowflake → (SnowflakeOperator)&lt;BR /&gt;3. Run Data Quality Checks → (Custom PythonOperator)&lt;BR /&gt;4. Trigger Databricks Notebook → (DatabricksSubmitRunOperator)&lt;BR /&gt;5. Push Result to REST API → (HttpOperator)&lt;BR /&gt;6.Run some spark job on EMR&lt;BR /&gt;7. Send Slack Notification → (SlackWebhookOperator)&lt;/P&gt;&lt;P&gt;As you can see, in above scenario it could be better to use Airflow because it has a rich ecosystem of pre-built operators (Slack, AWS, GCP, Azure, Kuberenetes etc.).&lt;BR /&gt;Also, you can write your own operators for custom needs (maybe you need to send some kind of custom notification after workflow succeeds/fails - you can do this by writing your own custom operator or check if already exists that fits your need).&lt;/P&gt;&lt;P&gt;Regarding your second question, not necessarily. Databricks Workflows integrates with dbt core really well (so does Airflow). And&amp;nbsp; product team keep adding tons of new feature at each release.&amp;nbsp;&lt;BR /&gt;So, if you don't need really complex orchestration scenario stick do workflows. They're simpler and you don't need to setup whole infrastracture to run it (like you need in case of Airflow).&lt;BR /&gt;Otherwise, if you need to handle custom things or orchestrate systems that you cannot in workflows then choose airflow. But as I said, airflow has definitely steeper learning curve.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/jobs/how-to/use-dbt-in-workflows" target="_blank" rel="noopener"&gt;Use dbt transformations in Lakeflow Jobs | Databricks Documentation&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jul 2025 22:19:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126599#M47822</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-27T22:19:24Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126600#M47823</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp; for the thorough explanation.&lt;/P&gt;</description>
      <pubDate>Sun, 27 Jul 2025 23:51:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126600#M47823</guid>
      <dc:creator>sandelic</dc:creator>
      <dc:date>2025-07-27T23:51:17Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks with Airflow</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126613#M47824</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176875"&gt;@sandelic&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;No problem, if the answer was helpful please consider marking it as a solution. This way we help other community members find solution for similiar question faster.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 05:53:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-with-airflow/m-p/126613#M47824</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-28T05:53:27Z</dc:date>
    </item>
  </channel>
</rss>

