Databricks

SailajaB · ‎11-29-2021

Hi,

I would like to capture notebook custom log exceptions(python) from ADF pipeline based on the exceptions pipeline should got succeed or failed.

Is there any mechanism to implement it. In my testing ADF pipeline is successful irrespective of the log errors.

Notebook always returns SUCCESS do adf's activity, even exception is raised in notebook.If a notebook contains any exceptions then adf pipeline which contains that particular notebook activity should fail

Thank you

SailajaB · ‎11-29-2021

Thank you for your response.

My question is not to store/get the log info.

My scenario is like:

Notebook always returns SUCCESS do adf's activity, even exception is raised in notebook.If a notebook contains any exceptions then adf pipeline which contains that particular notebook activity should fail.

View solution in original post

Prabakar · ‎11-29-2021

Hi @Sailaja B the notebook errors will be tracked in the driver log4j output. You can check the cluster's driver logs to get this information. Or you can set logging to your cluster so that all the messages will be logged in the dbfs or storage path that you provide.

Please refer to the document.

https://docs.databricks.com/clusters/configure.html#cluster-log-delivery-1

SailajaB · ‎11-29-2021

Thank you for your response.

My question is not to store/get the log info.

My scenario is like:

Notebook always returns SUCCESS do adf's activity, even exception is raised in notebook.If a notebook contains any exceptions then adf pipeline which contains that particular notebook activity should fail.

Prabakar · ‎11-29-2021

Hi @Sailaja B notebook/job fails to happen when there is really a failure. Some exceptions are information that might not hurt the running notebook. To understand better, please share the exception that you see in the notebook output.

SailajaB · ‎11-29-2021

Here is the sample code

if not any(mount.mountPoint == "/test/" for mount in dbutils.fs.mounts()):

dbutils.fs.mount(source = "***",

mount_point = "/test/",

extra_configs = configs)

else:

logger.error("Directory is already mounted")

Note : Mount path is already existed.. If I run this notebook through ADF pipeline, I am expecting that pipeline should fails but it is not getting failed.

jose_gonzalez · ‎11-29-2021

Hi @Sailaja B ,

You will beed to raise/throw the error exception to stop your Spark execution. Try to use a try..catch statement block handle your custom exceptions.

SailajaB · ‎11-30-2021

Hi @Jose Gonzalez ,

Thank you for your reply..

It is working as expected with try .. exception..assert False..

manasa · ‎09-07-2022

Hi @Sailaja B Is it working now?I mean, Is pipeline showing failure when notebook failed.If yes please share a sample snippet?(I am also trying the same case .I am able capture logs from pipeline side using output Json but couldn't modify pipeline status)

-werners- · ‎11-29-2021

Next to the mentioned option, there is also the possibility to analyze logs using Azure Log Analytics:

https://docs.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/azure-diagno...

Hubert-Dudek · ‎11-29-2021

Also when you catch exception you can just save it anywhere even to Databricks Table something like:

try:
(...)
except Exception as error:
   spark.sql(f"""INSERT INTO   (...)   """", repr(error))
   dbutils.notebook.exit(str(jobId) + ' - ERROR!!! - ' + repr(error))

In my opinion as @werners said is good choice to send to Azure Log Analytics for detailed analysis but I like also to use also above method and just have nice table in databricks with jobs which failed 😉

User16826994569 · ‎12-21-2021

Hi SailajaB,

Try this out.

Notebook, once executed successfully return a long JSON formatted output. We need to specify appropriate nodes to fetch the output.

In below screenshot we can see that when notebook ran it returns empName & empCity as output.

To capture this, we need to:

In respective pipeline, add a VARIABLE (to capture output of NOTEBOOK Task)
Add a SET VARIABLE activity and use VARIABLE defined in above step and add below expression:

@activity(''YOUR NOTEBOOK ACTIVITY NAME').output.runOutput.an_object.name.value