cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Init Scripts with mounted azure data lake storage gen2

repcak
New Contributor III

I'm trying to access init script which is stored on mounted azure data lake storage gen2 to dbfs

I mounted storage to

dbfs:/mnt/storage/container/script.sh

and when i try to access it

i got an error:

Cluster scoped init script dbfs:/mnt/storage/container/script.sh failed: Timed out with exception after 5 attempts (debugStr = 'Reading remote file for init script'), Caused by: java.io.FileNotFoundException: /WORKSPACE_ID/mnt/storage/container/script.sh: No such file or directory.

1) I see this file in dbfs using magic "%sh" command in notebook

2) I can read from this path using a spark.read...

in docs i found

https://docs.databricks.com/dbfs/unity-catalog.html#use-dbfs-while-launching-unity-catalog-clusters-...

Databricks recommends using DBFS mounts for init scripts, configurations, and libraries stored in external storage. This behavior is not supported in shared access mode.

When i try to access this file using

abfss:// i got an error:

Failure to initialize configuration for storage account storage_name.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key.)

but i used the same credentials like in "mount credentials" in previous way.

Does init scripts have any limitations with mounted dbfs?

I am concerned about the added workspace id in the error message at the beginning of the path 

I'm using the exactly the same path which i get using this command:

dbutils.fs.ls("/mnt/storage/container/script.sh")

I assume that when calling this command, the cluster is not yet running so I cannot travel ADLS. So i should use abfss:// instead

But how to authenticate with this storage, i tried this way

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#--access-azure-data-lake-st...

using service principal in spark config but it doesnt work.

Is this storage should be public?

1 REPLY 1

User16752239289
Databricks Employee
Databricks Employee

I do not think the init script saved under mount point work and we do not suggest that.

If you specify abfss , then the cluster need to be configured so that the cluster can authenticate and access the adls gen2 folder. Otherwise, the cluster will not be able to load the init script to run during the start up

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group