Databricks Community

eq · ‎10-13-2021

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.

The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason.

So the question is: Is it currently possible to have an onComplete status (similar to those in Azure Data Factory or SQL Server Integration services-SSIS) that regardsless of the task success or failure we can continue with the workflow and execute the next tasks.

Hubert-Dudek · ‎10-13-2021

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

View solution in original post

Hubert-Dudek · ‎10-13-2021

It could be really nice to add there as a feature. Like run "on success", run "on failure". Something like this is in other platforms for example in Power Automate or Qlikview reloads.

Currently I am just using try/except own logic (so it doesn't fail 😉 ).

eq · ‎10-13-2021

I have also been researching on how to get this done with the try/except block. For example if we catch the error but do not raise it, just printing the error to the stdout would cause the current cell to succeeed and continue execution of the next cell and/or task in the flow.

However this is very basic and does not account for specific erros, it is just printing the error you got thus making your code in the notebook cell not to fail, but also exposing you a lot and is not considered best practise at all.

try:
   do_something()
except Exception as e:
   print(e)

Perhaps there is a smarter way to do so if you please give an example of how you achieve the onCompletion status by using the try/except block it in code would be very beneficial.

Thank you.

Hubert-Dudek · ‎10-13-2021

Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.

Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):

if previous_status == 0:
  dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
  dbutils.notebook.run("notebook_to_run_when_previous_ok")

Anonymous · ‎10-13-2021

@Hubert Dudek - Thank you for sharing your knowledge!

jose_gonzalez · ‎10-13-2021

hi @Stefan V ,

I will highly recommend to try Databricks Jobs . Please check the docs and examples on how to use it here

eq · ‎10-14-2021

Hi @Jose Gonzalez , thank you.

So far our entire pipeline orchestration was done via Databricks Jobs. For our new purposes we are trying to re-engineer some of the workflows by using the Multi-task Jobs feature which is far more appealing considering the dependencies we have across our pipelines. Hopefully in the future Databricks release new versions of it, so to enable cluster re-use, dynamic parametrization, the above mentioned onSuccess/onFailure/onCompletion statuses for each container. It would be great for the community.

Many thanks for joining the discussion!

User16844513407 · ‎10-18-2021

Hi @Stefan V ,

My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conversation on what you are exactly trying to achieve and how you are using the new job orchestration capabilities at the moment, please send me an email at jan@databricks.com. We are always interested in feedback to help shape our roadmap and I see you are mentioning some other topics we are working on as well!

Best regards,

Jan