Repairing running workflow with few failed child jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sunday
I have a parent job that calls multiple child jobs in workflow, Out of 10 child jobs, one has failed and rest 9 are still running, I want to repair the failed child tasks. can I do that while the other child jobs are running?
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sunday
Hi holychs,
How are you doing today?, As per my understanding, yes, in Databricks Workflows, if you're running a multi-task job (like your parent job triggering multiple child tasks), you can repair only the failed task without restarting the entire job. However, to do this, you need to wait until the rest of the tasks finish running, because Databricks doesn't currently allow you to repair individual tasks while others are still in progress. Once all the other child jobs have completed (successfully or failed), you can go to the job run in the UI and click “Repair run”, then select just the failed task to retry. It’s a helpful feature for large workflows, and it avoids re-running everything from scratch. Let me know if you want help setting up repair-friendly dependencies or alerts!
Regards,
Brahma

