cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How does Task Orchestration compare to Airflow (for Databricks-only jobs)?

arthur_wang
New Contributor

One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.

I know there are all sorts of considerations - for example, if they're already running Airflow for non-Databricks jobs, they'll most likely continue using Airflow to centralize workflow management. But I'm curious about people's experiences with Task Orchestration, and what features or benefits it might have over Airflow for jobs that are 100% Databricks anyway.

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @ arthur.wang ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

Lauri
New Contributor III

We have a setup with both types of orchestration. The split comes down to ease of use. For example ETL jobs are running via Airflow, but some notebooks that users just want to schedule for themselves are done within Databricks.

It's just so much more easier to click "schedule" and choose a cluster and be done with it instead of writing an Airflow task or a DAG. Data scientists don't want to learn how to use specific scheduling software - they just want to have their notebook run every morning.

Shourya
New Contributor III

@Kaniz Fatma​ 

Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities that might come up if we go ahead with Databricks. Could you please provide any details this issue - on Databricks vs Airflow ? We are essentially looking for retries, notifications, task dependencies and Real-time Use case scenarios? If you can please include how it might benefit/not benefit different employees Like a Data Analyst/ Data Scientist etc. ?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.