cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

graceful dbutils mount/unmount

dchokkadi1_5588
New Contributor II

Is there a way to indicate to dbutils.fs.mount to not throw an error if the mount is already mounted?

And viceversa, for unmount to not throw an error if it is already unmounted?

I am trying to run my notebook as a job and it has a init section that mounts S3 buckets it needs. Sometimes the mounts are already done by an earlier script.

Since mounting an already mounted mount (wow) throws an error my job exits out.

1 ACCEPTED SOLUTION

Accepted Solutions

Bill_Chambers
Contributor II

@Deepak Chokkadi

This is the function that I use:

def mountBucket(dstBucketName:String, dstMountName:String) {
  import java.lang.IllegalArgumentException
  val accessKey = "YOUR ACCESS KEY"
  val encodedSecretKey = "YOUR SECRET".replace("/", "%2F")
  try {
    dbutils.fs.mount(s"s3a://$accessKey:$encodedSecretKey@$dstBucketName", dstMountName) 
    println("All done!")
  } catch {
    case e: java.rmi.RemoteException => {
      println("Directory is Already Mounted")
      dbutils.fs.unmount(dstMountName)
      mountBucket(dstBucketName, dstMountName)
    }
    case e: Exception => {
      println("There was some other error")
    }
  }
}

I've put it in a simple accessible notebook, then just run that notebook using %run. Then to mount a bucket I use that function and it automatically remount it.

View solution in original post

8 REPLIES 8

Bill_Chambers
Contributor II

@Deepak Chokkadi

This is the function that I use:

def mountBucket(dstBucketName:String, dstMountName:String) {
  import java.lang.IllegalArgumentException
  val accessKey = "YOUR ACCESS KEY"
  val encodedSecretKey = "YOUR SECRET".replace("/", "%2F")
  try {
    dbutils.fs.mount(s"s3a://$accessKey:$encodedSecretKey@$dstBucketName", dstMountName) 
    println("All done!")
  } catch {
    case e: java.rmi.RemoteException => {
      println("Directory is Already Mounted")
      dbutils.fs.unmount(dstMountName)
      mountBucket(dstBucketName, dstMountName)
    }
    case e: Exception => {
      println("There was some other error")
    }
  }
}

I've put it in a simple accessible notebook, then just run that notebook using %run. Then to mount a bucket I use that function and it automatically remount it.

DonatienTessier
Contributor

On my side, I tested if the mount point existed before mounted it:

if (!dbutils.fs.mounts.map(mnt => mnt.mountPoint).contains("/mnt/<directory>"))
  dbutils.fs.mount(
    source = "adl://<datalake_name>.azuredatalakestore.net/<directory>",
    mountPoint = s"/mnt/<directory>",
    extraConfigs = configs)    

Very nice! This is an equivalent if statement in Python:

if any(mount.mountPoint == '/mnt/<directory>' for mount in dbutils.fs.mounts()):

With a not for python no?

if not any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()):

__NikolajPurup
New Contributor II

For python, you could do something like this:

mountName = 'abc'

mounts = [str(i) for i in dbutils.fs.ls('/mnt/')] if "FileInfo(path='dbfs:/mnt/" +mountName + "/', name='" +mountName + "/', size=0)" in mounts: print(mountName + " has already been mounted") else: dbutils.fs.mount( source = "wasbs://"+mountName+"@<datalake_name>.blob.core.windows.net/", mount_point = "/mnt/" + mountName, extra_configs = {"fs.azure.sas."+ mountName +".<datalake_name>.blob.core.windows.net":dbutils.secrets.get(scope = "<secret_scope>", key = "<key_name>")})

viswanathboga
New Contributor II

Is there a way to mount a drive with Databricks CLI, I want the drive to be present from the time the cluster boots up.. I want to use a mounted blob storage to redirect the logs.

DonatienTessier
Contributor

Hi,

I guess you should create an init script that will be run when the cluster starts.

I asked the question here:

https://forums.databricks.com/questions/17305/mount-blob-storage-with-init-scripts.html

Mariano_IrvinLo
New Contributor II

If you use scala to mount a gen 2 data lake you could try something like this

/Gather relevant Keys/

var ServicePrincipalID = ""

var ServicePrincipalKey = ""

var DirectoryID = ""

/Create configurations for our connection/

var configs = Map ("fs.azure.account.auth.type" -> "OAuth",

"fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id" -> ServicePrincipalID,

"fs.azure.account.oauth2.client.secret" -> ServicePrincipalKey, "fs.azure.account.oauth2.client.endpoint" -> DirectoryID)

// Optionally, you can add <directory-name> to the source URI of your mount point.

if (dbutils.fs.mounts.map(mnt => mnt.mountPoint).contains("/mnt/ventas")){

"already mount"

}else{

dbutils.fs.mount( source = "/", mountPoint = "/mnt/", extraConfigs = configs) }

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group