cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Need for additional flow control (shortcomings with "run if dependencies")

kenmyers-8451
Contributor II

Maybe there is a way to do this that my team can't figure out but we have a process that kind of looks like this:

kenmyers8451_0-1769797889919.png

The main focus of this job1 and job2 but in theory let's extend this issue to any number of linear jobs. So the idea behind this is:

  1. job2 depends on job1 but only if job1 runs
  2. if job1 doesn't run, job2 can still run (imagine job1 is a long process and we may have already run it in a different run, so we want to set the conditional to turn it off, but we want to turn the conditional on for job2 to run; this is a simple example, assume there are other cases where one might do this).
  3. job2 can also optionally be turned off

So we've been trying a few different things but it seems like none of the offered "run if dependencies" allow us to do what is described above (without maybe adding another conditional that I'm working on testing next). Below are the examples where it breaks down:

Job2 uses all_succeeded "run if": Job1_cond_false + job2_cond_true = job2 doesn't run (should run because job2 condition is true)

Job2 uses at least one success "run if": Job1 success + job2_cond_false = job 2 runs (shouldn't run because job2 condition is set to false) 

Job2 uses None failed "run if": Job1 success + job2_cond_false = job 2 runs (shouldn't run because job2 condition is set to false) 

This last one is most interesting because when a conditional outputs false, it fails for an all_succeeded "run if", as if the false is not a success. However, when the run if is checking for none_failed and the output is false, it also is not treated as a failure. So it is like the conditional output exists as a different state that is neither success nor failed, but one we can't control for.

--- 

If you're curious I also tried a linear design like below:

kenmyers8451_1-1769798621711.png

But in this case it doesn't even make it to the job2_cond because despite using "none_failed". It's like excluded does count as a failure. If excluded is a subset of failure, then I think a false conditional output should also be a subset of failure so that none_failed treats it as such.

2 REPLIES 2

kenmyers-8451
Contributor II

The only thing that's coming to me right now is that maybe this can be done by making each task it's own job (which it is in my example but isn't normally) and putting the conditional into each job. But I don't think this is super feasible because it would mean turning any singular task into its own job.

The team I'm working with previously did this by setting the control in the notebook itself, but the problem with this is that it would have to wait to spin up the cluster to do this and some of their jobs have a couple of different clusters that shouldn't all need to spin up just to check if that a notebook should run.

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @kenmyers-8451,

You are describing a real gap in how "Run if dependencies" interacts with If/else condition task outcomes, and I want to walk through the current tooling so you can find a workable pattern.

UNDERSTANDING THE BEHAVIOR

When an If/else condition task evaluates to false, the tasks on the "true" branch get an "Excluded" status. In the dependency system, "Excluded" is treated as a successful completion, not as a failure. That is why:

- "All succeeded" blocks downstream tasks when a dependency is excluded (because the task itself did not actually run and produce a success).
- "None failed" and "At least one succeeded" let downstream tasks through, because excluded is not counted as a failure.

This creates the in-between state you identified, where the conditional outcome is neither a clear success nor a failure from the perspective of "Run if" logic.

RECOMMENDED PATTERN: USE IF/ELSE TASKS WITH TASK VALUES

The most flexible approach to get the behavior you want is to combine If/else condition tasks with task values. Here is the pattern:

1. Create a "gate" notebook task that runs before your conditional logic. This task checks whatever condition determines whether Job1 should run, and sets a task value accordingly:

dbutils.jobs.taskValues.set(key="job1_should_run", value="true")

or

dbutils.jobs.taskValues.set(key="job1_should_run", value="false")

2. Add an If/else condition task that checks this task value:

Operand 1: {{tasks.gate_task.values.job1_should_run}}
Operator: ==
Operand 2: true

3. On the "true" branch, run Job1 (your actual workload).

4. For Job2, add a separate If/else condition task that checks Job2's own condition independently, so Job2 is not blocked by Job1's branch outcome.

5. Set Job2's dependency to use "All done" on the overall workflow structure so it always gets a chance to evaluate, regardless of what happened upstream.

The key insight is to decouple the dependency chain. Instead of making Job2 depend directly on Job1 with "Run if" logic, make both Job1 and Job2 depend on a shared upstream gate, and give each its own If/else condition task for independent evaluation.

EXAMPLE WORKFLOW STRUCTURE

[gate_task] (notebook: sets task values for both conditions)
|
+---> [if_else_job1] (checks job1 condition)
| |
| +---> [job1_task] (true branch)
|
+---> [if_else_job2] (checks job2 condition independently)
|
+---> [job2_task] (true branch, depends on if_else_job2=true)

With this structure:
- If Job1's condition is false, Job1 is excluded, but Job2 evaluates its own condition independently.
- If Job2's condition is false, Job2 is excluded regardless of Job1.
- If both conditions are true, both run (in parallel or sequentially depending on your needs).

HANDLING THE CASE WHERE JOB2 DEPENDS ON JOB1'S OUTPUT

If Job2 actually needs Job1's output when Job1 runs, but should proceed without it when Job1 is skipped, you can extend the pattern:

1. Have Job1 set a task value on completion:

dbutils.jobs.taskValues.set(key="job1_completed", value="true")

2. Add a second If/else condition task before Job2 that checks both conditions:

First condition: {{tasks.gate_task.values.job2_should_run}} == true

3. Set Job2's "Run if dependencies" to "All done" so it always evaluates after the upstream tasks complete (or get excluded).

4. Inside Job2's notebook, use dbutils.jobs.taskValues.get() with a default value to gracefully handle the case where Job1 did not run:

job1_result = dbutils.jobs.taskValues.get(
taskKey="job1_task",
key="job1_completed",
default="false",
debugValue="false"
)

ALTERNATIVE: SPLIT INTO SEPARATE JOBS WITH RUN JOB TASKS

If the conditional logic becomes too complex within a single workflow, consider splitting Job1 and Job2 into separate Databricks Jobs and using "Run Job" tasks to orchestrate them from a parent job. This gives you full programmatic control:

1. Parent job runs a notebook that evaluates all conditions.
2. Based on the results, it triggers child jobs via the Run Job task type.
3. Each child job runs independently with its own success/failure handling.

This pattern scales well when you have many conditional branches.

DOCUMENTATION REFERENCES

- If/else condition tasks: https://docs.databricks.com/en/jobs/if-else.html
- Run if dependencies: https://docs.databricks.com/en/jobs/conditional-tasks.html
- Task values: https://docs.databricks.com/en/jobs/share-task-context.html
- Run Job task: https://docs.databricks.com/en/jobs/run-job-task.html

I hope one of these patterns helps you get the flow control behavior you need. The If/else task combined with task values gives you the most granular control within a single workflow.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.