cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Multi-task Jobs orchestration - simulating onComplete status

eq
New Contributor III

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.

The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason.

So the question is: Is it currently possible to have an onComplete status (similar to those in Azure Data Factory or SQL Server Integration services-SSIS) that regardsless of the task success or failure we can continue with the workflow and execute the next tasks.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

View solution in original post

7 REPLIES 7

Hubert-Dudek
Esteemed Contributor III

It could be really nice to add there as a feature. Like run "on success", run "on failure". Something like this is in other platforms for example in Power Automate or Qlikview reloads.

Currently I am just using try/except own logic (so it doesn't fail 😉 ).

eq
New Contributor III

I have also been researching on how to get this done with the try/except block. For example if we catch the error but do not raise it, just printing the error to the stdout would cause the current cell to succeeed and continue execution of the next cell and/or task in the flow.

However this is very basic and does not account for specific erros, it is just printing the error you got thus making your code in the notebook cell not to fail, but also exposing you a lot and is not considered best practise at all.

try:
   do_something()
except Exception as e:
   print(e)

Perhaps there is a smarter way to do so if you please give an example of how you achieve the onCompletion status by using the try/except block it in code would be very beneficial.

Thank you.

Hubert-Dudek
Esteemed Contributor III

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

Anonymous
Not applicable

@Hubert Dudek​ - Thank you for sharing your knowledge!

jose_gonzalez
Moderator
Moderator

hi @Stefan V​ ,

I will highly recommend to try Databricks Jobs . Please check the docs and examples on how to use it here

eq
New Contributor III

Hi @Jose Gonzalez​ , thank you.

So far our entire pipeline orchestration was done via Databricks Jobs. For our new purposes we are trying to re-engineer some of the workflows by using the Multi-task Jobs feature which is far more appealing considering the dependencies we have across our pipelines. Hopefully in the future Databricks release new versions of it, so to enable cluster re-use, dynamic parametrization, the above mentioned onSuccess/onFailure/onCompletion statuses for each container. It would be great for the community.

Many thanks for joining the discussion!

User16844513407
New Contributor III

Hi @Stefan V​ ,

My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conversation on what you are exactly trying to achieve and how you are using the new job orchestration capabilities at the moment, please send me an email at jan@databricks.com. We are always interested in feedback to help shape our roadmap and I see you are mentioning some other topics we are working on as well!

Best regards,

Jan

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.