06-08-2022 01:18 PM
I have a master notebook that runs a few different notebooks on a schedule using the dbutils.notebook.run() function. Occasionally, these child notebooks will fail (due to API connections or whatever). My issue is, when I attempt to catch the errors with:
try:
dbutils.notebook.run(notebook_path, timeout_seconds=0)
except Exception as e:
print(e)
The error is always the same regardless of the notebook/failure point:
An error occurred while calling o8701._run.\n: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED\n\tat com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:98)\n\tat com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:134)\n\tat com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:96)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)\n\tat py4j.Gateway.invoke(Gateway.java:295)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:251)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.databricks.NotebookExecutionException: FAILED\n\tat com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:146)\n\tat com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:93)\n\t... 13 more\n
It would be useful to capture the actual error that occurred in the notebook, rather than the one that just indicates that it failed.
I understand I could catch any exceptions and propagate them using the dbutils.notebook.exit() function, but I'd rather not have to wrap every potential issue in a try-except.
Is there a better way to capture the errors that occur in a child notebook?
06-09-2022 01:02 AM
May I suggest another way of working?
You could use workflows or schedule the notebooks in Glue/Data Factory.
The difference is that not all the notebooks run on the same cluster (compared to your setup).
I don´t know if that is an option?
06-09-2022 09:26 AM
Thanks for your suggestion, @werners, but that unfortunately won't work.
We originally did have our jobs all scheduled separately, but the growing number of them made things messy since you need to click through the UI to find the jobs, then again to find the errors.
We're now trying to build a framework that can log runs into a table automatically so we can have all that information in one place. It would be mighty helpful if we could also capture what errors occurred so we can recognize the type of error without needing to sift though the UI.
07-29-2022 11:34 AM
Have you try to use a custom logger to capture these error message?
08-17-2022 07:07 AM
A custom logger would work, but we were hoping for a solution that didn't require us to write specific code in every notebook since the scheduler will be used across teams.
08-16-2022 08:04 AM
Hey there @Caleb Van Tassel
Hope all is well!
Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
08-17-2022 07:12 AM
Unfortunately, we haven't been able to resolve this. It seems like we're stuck either manually clicking through notebooks, or specifically writing code every time we want an error to persist. Is there a place I can make a feature request? It would be very helpful if Databricks supported catching specific errors in notebooks using dbutils, rather than just throwing the placeholder WorkflowException.
10-18-2022 01:34 PM
I have the same issue. I see no reason that Databricks couldn't propagate the internal exception back through their WorkflowException
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group