a month ago
I am using Databricks on Azure.
in pyspark I register UDF java function
spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())
Foo tries to load a file, using Files.readAllLines(), located in the Databricks unity catalog .
stderr log:
Tue Jan 7 12:36:33 2025 Connection to spark from PID 1908
Tue Jan 7 12:36:33 2025 Initialized gateway on port 33693
Tue Jan 7 12:36:34 2025 Connected to spark.
2025/01/07 12:36:39 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of langchain. If you encounter errors during autologging, try upgrading / downgrading langchain to a supported version, or try upgrading MLflow.
2025/01/07 12:36:41 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of openai. If you encounter errors during autologging, try upgrading / downgrading openai to a supported version, or try upgrading MLflow.
java.nio.file.FileSystemException: /Volumes/xxxx_volume/config/foo.yaml: Operation not permitted
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
at java.base/java.nio.file.Files.newInputStream(Files.java:160)
at java.base/java.nio.file.Files.newBufferedReader(Files.java:2922)
at java.base/java.nio.file.Files.readAllLines(Files.java:3412)
at java.base/java.nio.file.Files.readAllLines(Files.java:3453)
at XXXXXXXXXXXXXXXXXXXXXXXXXXXXx
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:696)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
python itself can access the file. Seems issue only when java access it.
in terminal of databricks i see the file as:
-rwxrwxrwx 1 nobody nogroup 512 Jan 7 07:57 foo.yaml
Custer created from the Compute pool.
a month ago
Check File Permissions: Ensure thatthe file foo.yaml
has the correct permissions set for the user running the Java process. The file should be accessible by the user under which the Java process is running. You can check the file permissions using the ls -l
command in the terminal.
Check Mount Options: Verify that the volume /Volumes/xxxx_volume
is mounted with the correct options that allow Java to access the files. Sometimes, volumes mounted with certain options might restrict access to specific users or processes.
Run Java Process with Elevated Privileges: If possible, try running the Java process with elevated privileges (e.g., using sudo
) to see if it resolves the permission issue. However, this should be done with caution and only if it is safe and appropriate for your environment.
a month ago
in the post i wrote the current file permission.
there are no any (explicit) mounts - what exactly and where should it be defined?
I am not running Java process explicitly, as was stated in post , i only do
spark.udf.registerJavaFunction("foo", "com.foo.Foo", T.StringType())
a month ago
looks like it's related to Databricks security, don't allow to access file from constructor ,only allow from the call() method.
public class SomeUDF implements UDF2<String, String,String> {
public SomeUDF(){
try {
List<String> lines = Files.readAllLines(Paths.get(path)); //<--operation not allowed
for (String line : lines) {
log.info("line from file: " + line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@Override
public String call(String path, String topic) throws Exception {
try {
List<String> lines = Files.readAllLines(Paths.get(path)); //works
for (String line : lines) {
log.info("line from file: " + line);
}
} catch (IOException e) {
e.printStackTrace();
}
return "Hello";
}
any suggestion? as I must run init code on load that reads file content that in catalog.
4 weeks ago
To address the issue of needing to run initialization code that reads file content during the load of a UDF (User Defined Function) in Databricks, you should avoid performing file operations in the constructor due to security restrictions. Instead, you can use a static block or a singleton pattern to ensure the initialization code runs only once when the class is loaded
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group