I am using Databricks on Azure.
in pyspark I register UDF java function
spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())
Foo tries to load a file, using Files.readAllLines(), located in the Databricks unity catalog .
stderr log:
Tue Jan 7 12:36:33 2025 Connection to spark from PID 1908
Tue Jan 7 12:36:33 2025 Initialized gateway on port 33693
Tue Jan 7 12:36:34 2025 Connected to spark.
2025/01/07 12:36:39 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of langchain. If you encounter errors during autologging, try upgrading / downgrading langchain to a supported version, or try upgrading MLflow.
2025/01/07 12:36:41 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of openai. If you encounter errors during autologging, try upgrading / downgrading openai to a supported version, or try upgrading MLflow.
java.nio.file.FileSystemException: /Volumes/xxxx_volume/config/foo.yaml: Operation not permitted
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at java.base/java.nio.file.Files.newBufferedReader(Files.java:2922)
at java.base/java.nio.file.Files.readAllLines(Files.java:3412)
at java.base/java.nio.file.Files.readAllLines(Files.java:3453)
at XXXXXXXXXXXXXXXXXXXXXXXXXXXXx
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:696)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
python itself can access the file. Seems issue only when java access it.
in terminal of databricks i see the file as:
-rwxrwxrwx 1 nobody nogroup 512 Jan 7 07:57 foo.yaml
Custer created from the Compute pool.