cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job Notifications specifically on Succeeded with Failures

juan_maedo
New Contributor II

Hi everyone,

I have a set of jobs that always execute the last task regardless of whether the previous ones failed or not (using the ‘ALL done’ execution dependency).
When moving to production and wanting to enable notifications, there is no option to notify on ‘Succeeded with failures’, only “Success” or ‘Failure’.
I have been checking and this is because it identifies it as ‘Success’, but is there any way to distinguish it?

In the end, activating on failure and success if everything goes well overloads notifications and there is a risk of not checking it.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

Databricks does not provide a direct way to distinguish or send notifications specifically for a "Succeeded with failures" state at the job level—the job is classified as "Success" even when some upstream tasks have failed, if the last (leaf) task is successful due to the "ALL done" dependency setting. This is a limit in their notification system, which only allows alerting for "Success" or "Failure" but not for this nuanced scenario.

Job Status Logic and Leaf Tasks

  • The overall job status is determined by the outcome of the leaf tasks (those without downstream dependencies). If a final task runs on an "ALL done" basis and succeeds, the whole job is labeled as "Success" even if other tasks failed.

  • This logic means that any task failures upstream only trigger notifications if they are themselves leaf tasks, or if the overall state is "Failure" due to no following successful task.

Notification Workarounds

  • To detect jobs that have succeeded with some failures, configure individual task-level notifications for "Failure" rather than relying solely on job-level notifications. This way, a failure at any task will trigger an alert, regardless of the job's eventual "Success" state.

  • Alternatively, design an explicit "aggregator" task at the end that checks the state of upstream tasks (via APIs or by writing status flags to a storage location), and fails itself if any upstream tasks failed. This will push the job into a fail state and allow for traditional "Failure" notifications to be triggered.

View solution in original post

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Databricks does not provide a direct way to distinguish or send notifications specifically for a "Succeeded with failures" state at the job level—the job is classified as "Success" even when some upstream tasks have failed, if the last (leaf) task is successful due to the "ALL done" dependency setting. This is a limit in their notification system, which only allows alerting for "Success" or "Failure" but not for this nuanced scenario.

Job Status Logic and Leaf Tasks

  • The overall job status is determined by the outcome of the leaf tasks (those without downstream dependencies). If a final task runs on an "ALL done" basis and succeeds, the whole job is labeled as "Success" even if other tasks failed.

  • This logic means that any task failures upstream only trigger notifications if they are themselves leaf tasks, or if the overall state is "Failure" due to no following successful task.

Notification Workarounds

  • To detect jobs that have succeeded with some failures, configure individual task-level notifications for "Failure" rather than relying solely on job-level notifications. This way, a failure at any task will trigger an alert, regardless of the job's eventual "Success" state.

  • Alternatively, design an explicit "aggregator" task at the end that checks the state of upstream tasks (via APIs or by writing status flags to a storage location), and fails itself if any upstream tasks failed. This will push the job into a fail state and allow for traditional "Failure" notifications to be triggered.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now