โ07-27-2025 12:18 PM
Hi there,
I'm trying to understand the advantages of using Airflow operators to orchestrate Databricks notebooks, given that Databricks already offers its own workflow solution. Could someone please explain the benefits?
Thanks,
Stefan
โ07-27-2025 01:28 PM - edited โ07-27-2025 01:28 PM
Hi @sandelic ,
If you workload is mainly Databricks-centered then stick to workflows. They are easy to manage and worfklows directly integrate with Databricks notebooks and jobs.
But sometimes your workload requires complex orchestration and scheduling between many different systems and Airflow was exactly made for this. Airflow allows for extensive customization, you can author and schedule workflows programatically in Python (you can do something similar with DAB, but Airflow has more options), supports a wide range of integrations with different systems, including cloud platforms, databases, and more.
I would say, if youโre running primarily Spark-based workflows, Databricks Workflows are a great choice. However, if your data pipelines involve several different systems working together, Airflow is probably a better fit for your needs. It has steeper learning curve though.
โ07-27-2025 02:32 PM
Thanks for clarifying @szymon_dybczak . Could you elaborate on 'several different systems working together'? Specifically, does this imply that Airflow is recommended when other tools, like dbt-core, are already in use (for instance, if Databricks workflows integrate with dbt-cloud) ?
โ07-27-2025 03:16 PM - edited โ07-27-2025 03:19 PM
Sure, in many real-world data pipelines, you donโt just process data in one tool like Databricks โ instead, you're interacting with a variety of systems at different stages of the pipeline. So, let's say that your workload requires orchestrating following things:
1. S3 File Upload โ (AWS S3 Sensor)
2. Load File into Snowflake โ (SnowflakeOperator)
3. Run Data Quality Checks โ (Custom PythonOperator)
4. Trigger Databricks Notebook โ (DatabricksSubmitRunOperator)
5. Push Result to REST API โ (HttpOperator)
6.Run some spark job on EMR
7. Send Slack Notification โ (SlackWebhookOperator)
As you can see, in above scenario it could be better to use Airflow because it has a rich ecosystem of pre-built operators (Slack, AWS, GCP, Azure, Kuberenetes etc.).
Also, you can write your own operators for custom needs (maybe you need to send some kind of custom notification after workflow succeeds/fails - you can do this by writing your own custom operator or check if already exists that fits your need).
Regarding your second question, not necessarily. Databricks Workflows integrates with dbt core really well (so does Airflow). And product team keep adding tons of new feature at each release.
So, if you don't need really complex orchestration scenario stick do workflows. They're simpler and you don't need to setup whole infrastracture to run it (like you need in case of Airflow).
Otherwise, if you need to handle custom things or orchestrate systems that you cannot in workflows then choose airflow. But as I said, airflow has definitely steeper learning curve.
Use dbt transformations in Lakeflow Jobs | Databricks Documentation
โ07-27-2025 04:51 PM
Thanks @szymon_dybczak for the thorough explanation.
โ07-27-2025 10:53 PM
Hi @sandelic ,
No problem, if the answer was helpful please consider marking it as a solution. This way we help other community members find solution for similiar question faster.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now