ā10-13-2021 02:14 AM
Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.
The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason.
So the question is: Is it currently possible to have an onComplete status (similar to those in Azure Data Factory or SQL Server Integration services-SSIS) that regardsless of the task success or failure we can continue with the workflow and execute the next tasks.
ā10-13-2021 04:41 AM
Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.
Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):
if previous_status == 0:
dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
dbutils.notebook.run("notebook_to_run_when_previous_ok")
ā10-13-2021 02:35 AM
It could be really nice to add there as a feature. Like run "on success", run "on failure". Something like this is in other platforms for example in Power Automate or Qlikview reloads.
Currently I am just using try/except own logic (so it doesn't fail š ).
ā10-13-2021 04:08 AM
I have also been researching on how to get this done with the try/except block. For example if we catch the error but do not raise it, just printing the error to the stdout would cause the current cell to succeeed and continue execution of the next cell and/or task in the flow.
However this is very basic and does not account for specific erros, it is just printing the error you got thus making your code in the notebook cell not to fail, but also exposing you a lot and is not considered best practise at all.
try:
do_something()
except Exception as e:
print(e)
Perhaps there is a smarter way to do so if you please give an example of how you achieve the onCompletion status by using the try/except block it in code would be very beneficial.
Thank you.
ā10-13-2021 04:41 AM
Save repr(e) (and other error details if needed) to database of your choice (can be also databricks table on dbfs) with status column (example 0 failed, 1 success). In next job you can read status of previous job and behave depends on it. This way you can also have a nice compact log.
Theoretically in the next notebook we can have something like that (notebook A to be run via task-orchestration which run notebook B or notebook C depends on the situation):
if previous_status == 0:
dbutils.notebook.run("notebook_to_run_when_previous_failed")
else:
dbutils.notebook.run("notebook_to_run_when_previous_ok")
ā10-13-2021 09:39 AM
@Hubert Dudekā - Thank you for sharing your knowledge!
ā10-13-2021 11:05 AM
hi @Stefan Vā ,
I will highly recommend to try Databricks Jobs . Please check the docs and examples on how to use it here
ā10-14-2021 01:04 AM
Hi @Jose Gonzalezā , thank you.
So far our entire pipeline orchestration was done via Databricks Jobs. For our new purposes we are trying to re-engineer some of the workflows by using the Multi-task Jobs feature which is far more appealing considering the dependencies we have across our pipelines. Hopefully in the future Databricks release new versions of it, so to enable cluster re-use, dynamic parametrization, the above mentioned onSuccess/onFailure/onCompletion statuses for each container. It would be great for the community.
Many thanks for joining the discussion!
ā10-18-2021 06:47 AM
Hi @Stefan Vā ,
My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conversation on what you are exactly trying to achieve and how you are using the new job orchestration capabilities at the moment, please send me an email at jan@databricks.com. We are always interested in feedback to help shape our roadmap and I see you are mentioning some other topics we are working on as well!
Best regards,
Jan
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonāt want to miss the chance to attend and share knowledge.
If there isnāt a group near you, start one and help create a community that brings people together.
Request a New Group