Databricks Community

yevsh · ‎01-07-2025

I am using Databricks on Azure.

in pyspark I register UDF java function

spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())
Foo tries to load a file, using Files.readAllLines(), located in the Databricks unity catalog .

stderr log:

Tue Jan 7 12:36:33 2025 Connection to spark from PID 1908
Tue Jan 7 12:36:33 2025 Initialized gateway on port 33693
Tue Jan 7 12:36:34 2025 Connected to spark.
2025/01/07 12:36:39 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of langchain. If you encounter errors during autologging, try upgrading / downgrading langchain to a supported version, or try upgrading MLflow.
2025/01/07 12:36:41 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of openai. If you encounter errors during autologging, try upgrading / downgrading openai to a supported version, or try upgrading MLflow.
java.nio.file.FileSystemException: /Volumes/xxxx_volume/config/foo.yaml: Operation not permitted
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at java.base/java.nio.file.Files.newBufferedReader(Files.java:2922)
at java.base/java.nio.file.Files.readAllLines(Files.java:3412)
at java.base/java.nio.file.Files.readAllLines(Files.java:3453)
at XXXXXXXXXXXXXXXXXXXXXXXXXXXXx
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:696)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)

python itself can access the file. Seems issue only when java access it.

in terminal of databricks i see the file as:

-rwxrwxrwx 1 nobody nogroup 512 Jan 7 07:57 foo.yaml
Custer created from the Compute pool.

Walter_C · ‎01-07-2025

Check File Permissions: Ensure thatthe file foo.yaml has the correct permissions set for the user running the Java process. The file should be accessible by the user under which the Java process is running. You can check the file permissions using the ls -l command in the terminal.
Check Mount Options: Verify that the volume /Volumes/xxxx_volume is mounted with the correct options that allow Java to access the files. Sometimes, volumes mounted with certain options might restrict access to specific users or processes.
Run Java Process with Elevated Privileges: If possible, try running the Java process with elevated privileges (e.g., using sudo) to see if it resolves the permission issue. However, this should be done with caution and only if it is safe and appropriate for your environment.

yevsh · ‎01-07-2025

in the post i wrote the current file permission.

there are no any (explicit) mounts - what exactly and where should it be defined?

I am not running Java process explicitly, as was stated in post , i only do
spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())

yevsh · ‎01-09-2025

looks like it's related to Databricks security, don't allow to access file from constructor ,only allow from the call() method.

public class SomeUDF implements  UDF2<String, String,String> {
    
     public SomeUDF(){
     
       try {

            List<String> lines = Files.readAllLines(Paths.get(path)); //<--operation not allowed
            for (String line : lines) {

                log.info("line from file: " + line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
     
     }


    @Override
    public String call(String path, String topic) throws Exception {

        try {

            List<String> lines = Files.readAllLines(Paths.get(path)); //works
            for (String line : lines) {

                log.info("line from file: " + line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        return "Hello";


    }

any suggestion? as I must run init code on load that reads file content that in catalog.

Walter_C · ‎01-13-2025

To address the issue of needing to run initialization code that reads file content during the load of a UDF (User Defined Function) in Databricks, you should avoid performing file operations in the constructor due to security restrictions. Instead, you can use a static block or a singleton pattern to ensure the initialization code runs only once when the class is loaded

Databricks Community

UDF java can't access files in Unity Catalog - Operation not permitted

Photos

Connect with Databricks Users in Your Area

Exciting News: Databricks Community is Nominated for Best Customer Support Community!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Databricks Community Champion - February 2025 - Stefan Koch

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!