cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Robust/complex scheduling with dependency within Databricks?

RIDBX
Contributor

Robust scheduling with dependency within Databricks?

======================================

 

 

Thanks for reviewing my threads. I like to explore Robust/complex scheduling with dependency within Databricks.

We know traditional scheduling framework allow robust dependency/conditions setting across multiple tiers etc.

How can we do that within Databricks scheduling?

Eg:

We have a HR application in  tier1 - 100 jobs Start time 12AM

We have a Finance applications in tier2 - 125 jobs Start time 10AM + completion of  HR applications (100 jobs)

These can be run daily or weekly.

How do we do this?

Are there any doc/whitepapers on this subject?


Thanks for your insights.

3 REPLIES 3

pradeep_singh
Contributor III
Intresting Scenario . This is what i think you can do

Tier 1 job: A single job that contains 100 tasks( where each task trigger a job) scheduled to run at 12:00 AM.
Tier 2 job: A single job that contains 125 tasks, scheduled to run at 10:00 AM.
All 125 tasks in Tier 2 depend on a single SQL Alert task that runs a SQL query against system tables to determine whether the Tier 1 job has completed. As soon as the job completes the SQL Alert task should dependecy would be fullfilled and the job would be ready to run as 10 am when its scheduled to run .
You dont need 100 or 125 task to run each job . You can simply them design by using for each loop and reading parameters from a json for each jobs and if else conditions

 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

pradeep_singh
Contributor III

Further readings - 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @RIDBX,

Databricks Lakeflow Jobs has several features that let you build exactly this kind of tiered, dependency-driven orchestration natively. Here is how I would approach your HR (Tier 1) and Finance (Tier 2) scenario.

OPTION 1: SINGLE ORCHESTRATOR JOB WITH RUN JOB TASKS

The cleanest approach is to create one top-level orchestrator job that coordinates everything. Databricks supports a "Run Job" task type that lets a task inside one job trigger and wait for another job to complete before downstream tasks proceed.

Your design would look like this:

Orchestrator Job (scheduled daily at 12 AM)
|
|-- [HR_Job_1]  (Run Job task -> triggers HR Job 1)
|-- [HR_Job_2]  (Run Job task -> triggers HR Job 2)
|-- ...
|-- [HR_Job_100] (Run Job task -> triggers HR Job 100)
|
|-- (all 100 HR tasks must succeed)
|
|-- [Finance_Job_1]   (Run Job task -> triggers Finance Job 1)
|-- [Finance_Job_2]   (Run Job task -> triggers Finance Job 2)
|-- ...
|-- [Finance_Job_125] (Run Job task -> triggers Finance Job 125)

Each of the 100 HR tasks is a "Run Job" task that triggers its respective standalone HR job. The 125 Finance tasks are also "Run Job" tasks, and each one is configured to depend on ALL 100 HR tasks completing successfully. This is done through the task dependency graph (DAG) in the Jobs UI.

To set up a Run Job task:
1. In your orchestrator job, click "Add task"
2. Set the Type to "Run Job"
3. Select the target job from the dropdown
4. Set the dependencies to the upstream tasks that must complete first

A single Databricks job supports up to 1,000 tasks, so your 225 total tasks (100 + 125) fits well within that limit.

Documentation: https://docs.databricks.com/aws/en/jobs/run-job

OPTION 2: USE FOR EACH TASKS TO SIMPLIFY

If your HR and Finance jobs follow similar patterns and can be parameterized, you can use the "For Each" task to dramatically simplify the orchestrator. Instead of defining 100 individual Run Job tasks for HR, you define one For Each task that iterates over a list of job configurations.

Orchestrator Job (scheduled daily at 12 AM)
|
|-- [HR_ForEach] (For Each task, iterates over 100 HR job configs)
|       |-- nested task: Run Job (parameterized)
|
|-- (HR_ForEach must succeed)
|
|-- [Finance_ForEach] (For Each task, iterates over 125 Finance job configs)
        |-- nested task: Run Job (parameterized)

You can pass in a JSON array of parameters (job IDs, config values, etc.) and set a concurrency level to control how many run in parallel. The Finance For Each task depends on the HR For Each task, so it only starts after all HR iterations complete.

Documentation: https://docs.databricks.com/aws/en/jobs/for-each

OPTION 3: TWO SEPARATE JOBS WITH TABLE-UPDATE OR CONTINUOUS TRIGGERS

If you prefer to keep Tier 1 and Tier 2 as separate jobs, you can use trigger-based coordination:

1. Schedule the HR job (Tier 1) at 12 AM.
2. Configure the Finance job (Tier 2) with a table-update trigger or use a webhook/API-based approach:

 - Have the last task in the HR job write a "completion marker" to a Delta table.
 - Configure the Finance job with a table-update trigger that monitors that marker table.
 - The Finance job will automatically start when the marker table is updated.

Alternatively, the last task in the HR job can call the Databricks Jobs API (POST /api/2.1/jobs/run-now) to programmatically trigger the Finance job.

Documentation: https://docs.databricks.com/aws/en/jobs/triggers

CONDITIONAL EXECUTION WITH RUN IF

For more granular control, each downstream task can use "Run if dependencies" to specify conditions such as:
- "All succeeded" (default): run only if every upstream dependency succeeded
- "At least one succeeded": run if at least one upstream task succeeded
- "None failed": run if no upstream tasks failed
- "All done": run regardless of upstream success/failure (useful for cleanup)

This is configured per task in the dependency settings.

Documentation: https://docs.databricks.com/aws/en/jobs/run-if

IF/ELSE BRANCHING

If you need conditional logic (for example, skip Tier 2 entirely if a certain condition is met), the If/Else condition task evaluates expressions using task values, job parameters, or dynamic references. For example, you could check whether a quality gate passed before proceeding.

Documentation: https://docs.databricks.com/aws/en/jobs/if-else

MY RECOMMENDATION FOR YOUR SCENARIO

For 100 HR jobs followed by 125 Finance jobs with a hard dependency, I would recommend Option 1 or Option 2:

- If your jobs are diverse with different configurations, use Option 1 with individual Run Job tasks in a single orchestrator.
- If your jobs can be parameterized into a common template, use Option 2 with For Each tasks for a cleaner, more maintainable design.

Both approaches give you a single place to monitor the entire workflow, with a clear visual DAG showing the dependency between tiers.

For additional reading on Lakeflow Jobs orchestration:
https://docs.databricks.com/aws/en/jobs
https://docs.databricks.com/aws/en/jobs/sql (SQL task types including alerts)

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.