cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Handling Exceptions from dbutils.fs in Python

jcoggs
New Contributor II

I have a notebook that calls dbutils.fs.ls() for some derived file path in azure. Occasionally, this path may not exist, and in general I can't always guarantee that the path exists. When the path doesn't exist it throws an "ExecutionError" which appears to be suppressing the Py4JJavaError. Is there a way to handle this error while letting other exceptions that may occur be raised?

As far as I can tell, the "ExecutionError" class is defined and instantiated from within the function that's suppressing the Py4JJavaError making it locally scoped to that function. I'd rather avoid catching all exceptions if I can only catch this one exception being raised. If it just returned the Py4JJavaError then that could be easily caught and handled.

If there's no way to catch this specific exception being raised then could we request a feature for the class defining this new error be defined under FSHandler so it can be caught?

I've seen other similar questions, but I haven't found answers: https://community.databricks.com/t5/data-engineering/how-to-handle-java-io-exception-in-python-noteb...

2 REPLIES 2

Palash01
Contributor III

Hey @jcoggs 

The problem looks legit though never occurred to me as I try to keep my mounts manually fed to the pipeline using a parameters or a variable by doing this you will have more control over your pipelines see if you could do the same in your use case if not let's try to address your concern using a piece of logic:

try:
    # Your mount operation
    dbutils.fs.mount(source="...", mount_point="...", extra_configs={...})
except Exception as e:
    if "java.io.FileNotFoundException" in str(e.java_exception):
        print("Caught a java.io.FileNotFoundException: {}".format(e.java_exception))
    else:
        # Handle other exceptions or re-raise them if needed
        raise

I'm unable to test this code on my end at the moment so please share your findings on this thread.  

Leave a like if this helps! Kudos,
Palash

jcoggs
New Contributor II

Thanks for responding @Palash01

I was hoping to avoid parsing the text of the exceptions looking for errors, but it does seem like that's the way to go. The exception passed doesn't have the java_exception attribute because it's not a Py4JJavaError but rather a generic Exception with the text of the Py4JJavaError. Here's the source code in dbutils we're dealing with:

def prettify_exception_message(f):
"""
This is a decorator function that aims to properly display errors that happened on the
Scala side. Without such handling, stack traces from Scala are displayed at the
bottom of error output, and are easily missed. We fix this by catching Py4JJavaError
and throwing another exception with the error message from Scala side.
"""

def f_with_exception_handling(*args, **kwargs):
    try:
        return f(*args, **kwargs)
    except Py4JJavaError as e:

        class ExecutionError(Exception):
            pass

        # In Python 3, we need to use the new 'raise X from None' syntax
        # to suppress the original exception's traceback. However, we
        # can't directly use that syntax because we need to be compatible
        # with Python 2. It might appear that six's `raise_from` would
        # handle this but that function's implementation is wrong and the
        # six won't fix it: https://github.com/benjaminp/six/issues/193.
        # Therefore, we need this gross hack derived from PEP-409:
        exc = ExecutionError(str(e))
        exc.__context__ = None
        exc.__cause__ = None
        raise exc

return f_with_exception_handling

So as you said, we can catch all exceptions and try to determine the type of exception from the text of the error and stack trace, but it will have to be on str(e) instead of str(e.java_exception):

try:
    dbutils.fs.ls(test_location)
except Exception as e:
    if "java.io.FileNotFoundException" in str(e):
        print("Caught a java.io.FileNotFoundException")
    else:
        # Handle other exceptions or re-raise them if needed
        raise

I do wish we could just handle the Py4JJavaError and leave other exceptions unhandled instead of having to reraise, but I guess it's not really a big deal.

For anyone that is willing to try, though I don't recommend it, I think you could in theory create a custom class that inherits from DBUtils and modifies this method such that you could catch this specific exception. More specifically, I'm thinking you could define the exception class under FSHandler instead of having the class be function scoped. I am wondering why it isn't this way in the source code.

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.