09-01-2023 07:41 AM
I used to use dbfs with mounted directories and now I want to switch to Volumes for storing my jars and application.conf for pipelines.
I see the file my application.conf in Data Explorer > Catalog > Volumes, I also see the file with dbutils.fs.ls("/Volumes/.../path/to/file"),
but I'm not able to read it with Scala...
> dbutils.fs.ls("/Volumes/.../application.conf")
Cell output:
res52: Seq[com.databricks.backend.daemon.dbutils.FileInfo] = ArrayBuffer(FileInfo(/Volumes/.../application.conf, application.conf, 1602, 1693576640000))
My approaches:
1. pureconfig + custom case class
import pureconfig.{ConfigSource, ConfigReader, ConfigConvert}
import pureconfig.generic.auto._
val config = ConfigSource.file("/Volumes/.../application.conf")
config.loadOrThrow[Config]
Error: ConfigReaderException: Cannot convert configuration to a Config. Failures are: - Unable to read file /Volumes/.../application.conf (No such file or directory).
2. Scala io Source
import scala.io.Source
Source.fromFile("/Volumes/.../application.conf")
Error: FileNotFoundException: /Volumes/.../application.conf (No such file or directory)
3. java io
import java.io.{FileReader, File}
val fr = new FileReader(new File("/Volumes/.../application.conf"))
Error: FileNotFoundException: /Volumes/.../application.conf (No such file or directory)
4. java nio
import java.nio.charset.StandardCharsets
import java.nio.file.{Files, Paths}
val path = Paths.get("/Volumes/.../application.conf")
new String(Files.readAllBytes(path), StandardCharsets.UTF_8)
Error: FileNotFoundException: /Volumes/.../application.conf (No such file or directory)
09-01-2023 07:53 AM
Still not able to read with Scala, but it works fine with Python:
09-01-2023 08:49 AM
5. java io + scala io
01-18-2024 06:15 PM - edited 01-18-2024 06:16 PM
Volumes mount are accessible using scala code only on a shared cluster. On single user mode this features is not supported yet. We use init scripts to move contents from Volumes to clusters local drive, when we need to access files from Native Scala code. Volumes work out of the box for Spark APIs though - reading files, checkpoints, etc..
https://docs.databricks.com/en/files/index.html
Important
File operations that require FUSE access to data cannot directly access cloud object storage using URIs. Databricks recommends using Unity Catalog volumes to configure access to these locations for FUSE.
Scala does not support FUSE for Unity Catalog volumes or workspace files on compute configured with assigned access mode or clusters without Unity Catalog. Scala supports FUSE for Unity Catalog volumes and workspace files on compute configured with Unity Catalog and shared access mode.
Try below code on a Shared vs Single user cluster,
%scala
import java.nio.file.Paths
import scala.io.Source
import java.nio.file.{FileSystems, Files}
import scala.collection.JavaConverters._
Files.list(Paths.get("/Volumes")).iterator().asScala.foreach(println)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group