cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Handling Exceptions from dbutils.fs in Python

jcoggs
New Contributor II

I have a notebook that calls dbutils.fs.ls() for some derived file path in azure. Occasionally, this path may not exist, and in general I can't always guarantee that the path exists. When the path doesn't exist it throws an "ExecutionError" which appears to be suppressing the Py4JJavaError. Is there a way to handle this error while letting other exceptions that may occur be raised?

As far as I can tell, the "ExecutionError" class is defined and instantiated from within the function that's suppressing the Py4JJavaError making it locally scoped to that function. I'd rather avoid catching all exceptions if I can only catch this one exception being raised. If it just returned the Py4JJavaError then that could be easily caught and handled.

If there's no way to catch this specific exception being raised then could we request a feature for the class defining this new error be defined under FSHandler so it can be caught?

I've seen other similar questions, but I haven't found answers: https://community.databricks.com/t5/data-engineering/how-to-handle-java-io-exception-in-python-noteb...

2 REPLIES 2

Palash01
Valued Contributor

Hey @jcoggs 

The problem looks legit though never occurred to me as I try to keep my mounts manually fed to the pipeline using a parameters or a variable by doing this you will have more control over your pipelines see if you could do the same in your use case if not let's try to address your concern using a piece of logic:

try:
    # Your mount operation
    dbutils.fs.mount(source="...", mount_point="...", extra_configs={...})
except Exception as e:
    if "java.io.FileNotFoundException" in str(e.java_exception):
        print("Caught a java.io.FileNotFoundException: {}".format(e.java_exception))
    else:
        # Handle other exceptions or re-raise them if needed
        raise

I'm unable to test this code on my end at the moment so please share your findings on this thread.  

Leave a like if this helps! Kudos,
Palash

jcoggs
New Contributor II

Thanks for responding @Palash01

I was hoping to avoid parsing the text of the exceptions looking for errors, but it does seem like that's the way to go. The exception passed doesn't have the java_exception attribute because it's not a Py4JJavaError but rather a generic Exception with the text of the Py4JJavaError. Here's the source code in dbutils we're dealing with:

def prettify_exception_message(f):
"""
This is a decorator function that aims to properly display errors that happened on the
Scala side. Without such handling, stack traces from Scala are displayed at the
bottom of error output, and are easily missed. We fix this by catching Py4JJavaError
and throwing another exception with the error message from Scala side.
"""

def f_with_exception_handling(*args, **kwargs):
    try:
        return f(*args, **kwargs)
    except Py4JJavaError as e:

        class ExecutionError(Exception):
            pass

        # In Python 3, we need to use the new 'raise X from None' syntax
        # to suppress the original exception's traceback. However, we
        # can't directly use that syntax because we need to be compatible
        # with Python 2. It might appear that six's `raise_from` would
        # handle this but that function's implementation is wrong and the
        # six won't fix it: https://github.com/benjaminp/six/issues/193.
        # Therefore, we need this gross hack derived from PEP-409:
        exc = ExecutionError(str(e))
        exc.__context__ = None
        exc.__cause__ = None
        raise exc

return f_with_exception_handling

So as you said, we can catch all exceptions and try to determine the type of exception from the text of the error and stack trace, but it will have to be on str(e) instead of str(e.java_exception):

try:
    dbutils.fs.ls(test_location)
except Exception as e:
    if "java.io.FileNotFoundException" in str(e):
        print("Caught a java.io.FileNotFoundException")
    else:
        # Handle other exceptions or re-raise them if needed
        raise

I do wish we could just handle the Py4JJavaError and leave other exceptions unhandled instead of having to reraise, but I guess it's not really a big deal.

For anyone that is willing to try, though I don't recommend it, I think you could in theory create a custom class that inherits from DBUtils and modifies this method such that you could catch this specific exception. More specifically, I'm thinking you could define the exception class under FSHandler instead of having the class be function scoped. I am wondering why it isn't this way in the source code.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group