cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to trigger a Databricks job only after multiple other jobs have completed

dhruvs2
New Contributor

We have a use case where Job C should start only after both Job A and Job B have successfully completed.

In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.

Is there a way to configure something similar in Databricks, so that Job C automatically triggers only after Job A and Job B are finished?

I looked through the documentation but couldn't find anything specific for this scenario. Any guidance or best practices would be appreciated!

3 REPLIES 3

BS_THE_ANALYST
Esteemed Contributor II

Hey @dhruvs2 

You could use Lakeflow Jobs for this. You can add a job as a task:

BS_THE_ANALYST_0-1761599475438.png

Then you can just follow the docs from here: https://docs.databricks.com/aws/en/jobs/ there's loads of great sections / tutorials.

To answer your specific question:
When configuring a task you just change the DEPENDS ON:

BS_THE_ANALYST_1-1761599552661.png

 And the Run if dependencies

BS_THE_ANALYST_2-1761599696645.png

Above, you can see I selected All Succeeded. You can select just All done if you only worry about them completing.

All the best,
BS

hi @BS_THE_ANALYST , thanks for the response.
From what I understand, it seems we’d need to maintain a separate job or pipeline in Databricks to orchestrate everything — is that correct?

BS_THE_ANALYST
Esteemed Contributor II

Hi @dhruvs2  😀.

A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet 😀. Then you can setup the dependencies as you require.

If you get stuck with anything, give me a shout 🙂

Once you've got the hang of how the jobs work, you can then look into parameterisation where you can start making things really dynamic! https://docs.databricks.com/aws/en/jobs/job-parameters

Don't forget about monitoring/observability either: https://docs.databricks.com/aws/en/jobs/monitor#view-jobs-and-pipelines 

In terms of compute for running the jobs. I'd say that Serverless is your best bet. If not, and you're using classic compute, it's recommended to use job compute. Here's a good article to read more about compute considerations: https://docs.databricks.com/aws/en/jobs/compute#what-is-the-recommended-compute-for-each-task 

All the best,
BS