cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How does Task Orchestration compare to Airflow (for Databricks-only jobs)?

arthur_wang
New Contributor

One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.

I know there are all sorts of considerations - for example, if they're already running Airflow for non-Databricks jobs, they'll most likely continue using Airflow to centralize workflow management. But I'm curious about people's experiences with Task Orchestration, and what features or benefits it might have over Airflow for jobs that are 100% Databricks anyway.

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @ arthur.wang ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

Lauri
New Contributor III

We have a setup with both types of orchestration. The split comes down to ease of use. For example ETL jobs are running via Airflow, but some notebooks that users just want to schedule for themselves are done within Databricks.

It's just so much more easier to click "schedule" and choose a cluster and be done with it instead of writing an Airflow task or a DAG. Data scientists don't want to learn how to use specific scheduling software - they just want to have their notebook run every morning.

Shourya
New Contributor III

@Kaniz Fatmaโ€‹ 

Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities that might come up if we go ahead with Databricks. Could you please provide any details this issue - on Databricks vs Airflow ? We are essentially looking for retries, notifications, task dependencies and Real-time Use case scenarios? If you can please include how it might benefit/not benefit different employees Like a Data Analyst/ Data Scientist etc. ?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group