cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

UDF java can't access files in Unity Catalog - Operation not permitted

yevsh
New Contributor

I am using Databricks on Azure.


in pyspark I register UDF java function

spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())
Foo tries to load a file,  using Files.readAllLines(), located in the Databricks unity catalog .


stderr log:

Tue Jan 7 12:36:33 2025 Connection to spark from PID 1908
Tue Jan 7 12:36:33 2025 Initialized gateway on port 33693
Tue Jan 7 12:36:34 2025 Connected to spark.
2025/01/07 12:36:39 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of langchain. If you encounter errors during autologging, try upgrading / downgrading langchain to a supported version, or try upgrading MLflow.
2025/01/07 12:36:41 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of openai. If you encounter errors during autologging, try upgrading / downgrading openai to a supported version, or try upgrading MLflow.
java.nio.file.FileSystemException: /Volumes/xxxx_volume/config/foo.yaml: Operation not permitted
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at java.base/java.nio.file.Files.newBufferedReader(Files.java:2922)
at java.base/java.nio.file.Files.readAllLines(Files.java:3412)
at java.base/java.nio.file.Files.readAllLines(Files.java:3453)
at XXXXXXXXXXXXXXXXXXXXXXXXXXXXx
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:696)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)

python itself can access the file. Seems issue only when java access it.


in terminal of databricks i see the file as:

-rwxrwxrwx 1 nobody nogroup   512 Jan  7 07:57 foo.yaml
Custer created from the Compute pool.

2 REPLIES 2

Walter_C
Databricks Employee
Databricks Employee

 

  • Check File Permissions: Ensure thatthe file foo.yaml has the correct permissions set for the user running the Java process. The file should be accessible by the user under which the Java process is running. You can check the file permissions using the ls -l command in the terminal.

  • Check Mount Options: Verify that the volume /Volumes/xxxx_volume is mounted with the correct options that allow Java to access the files. Sometimes, volumes mounted with certain options might restrict access to specific users or processes.

  • Run Java Process with Elevated Privileges: If possible, try running the Java process with elevated privileges (e.g., using sudo) to see if it resolves the permission issue. However, this should be done with caution and only if it is safe and appropriate for your environment.

 

yevsh
New Contributor

in the post i wrote the current file permission.

there are no any (explicit) mounts - what exactly and where should it be defined?

I am not running Java process explicitly, as was stated in post , i only do 
spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group