cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Multi-task Jobs orchestration - simulating onComplete status

eq
New Contributor III

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.

The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason.

So the question is: Is it currently possible to have an onComplete status (similar to those in Azure Data Factory or SQL Server Integration services-SSIS) that regardsless of the task success or failure we can continue with the workflow and execute the next tasks.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

View solution in original post

7 REPLIES 7

Hubert-Dudek
Esteemed Contributor III

It could be really nice to add there as a feature. Like run "on success", run "on failure". Something like this is in other platforms for example in Power Automate or Qlikview reloads.

Currently I am just using try/except own logic (so it doesn't fail 😉 ).

eq
New Contributor III

I have also been researching on how to get this done with the try/except block. For example if we catch the error but do not raise it, just printing the error to the stdout would cause the current cell to succeeed and continue execution of the next cell and/or task in the flow.

However this is very basic and does not account for specific erros, it is just printing the error you got thus making your code in the notebook cell not to fail, but also exposing you a lot and is not considered best practise at all.

try:
   do_something()
except Exception as e:
   print(e)

Perhaps there is a smarter way to do so if you please give an example of how you achieve the onCompletion status by using the try/except block it in code would be very beneficial.

Thank you.

Hubert-Dudek
Esteemed Contributor III

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

Anonymous
Not applicable

@Hubert Dudek​ - Thank you for sharing your knowledge!

jose_gonzalez
Databricks Employee
Databricks Employee

hi @Stefan V​ ,

I will highly recommend to try Databricks Jobs . Please check the docs and examples on how to use it here

eq
New Contributor III

Hi @Jose Gonzalez​ , thank you.

So far our entire pipeline orchestration was done via Databricks Jobs. For our new purposes we are trying to re-engineer some of the workflows by using the Multi-task Jobs feature which is far more appealing considering the dependencies we have across our pipelines. Hopefully in the future Databricks release new versions of it, so to enable cluster re-use, dynamic parametrization, the above mentioned onSuccess/onFailure/onCompletion statuses for each container. It would be great for the community.

Many thanks for joining the discussion!

User16844513407
New Contributor III

Hi @Stefan V​ ,

My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conversation on what you are exactly trying to achieve and how you are using the new job orchestration capabilities at the moment, please send me an email at jan@databricks.com. We are always interested in feedback to help shape our roadmap and I see you are mentioning some other topics we are working on as well!

Best regards,

Jan

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group