cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Init Scripts with mounted azure data lake storage gen2

repcak
New Contributor III

I'm trying to access init script which is stored on mounted azure data lake storage gen2 to dbfs

I mounted storage to

dbfs:/mnt/storage/container/script.sh

and when i try to access it

i got an error:

Cluster scoped init script dbfs:/mnt/storage/container/script.sh failed: Timed out with exception after 5 attempts (debugStr = 'Reading remote file for init script'), Caused by: java.io.FileNotFoundException: /WORKSPACE_ID/mnt/storage/container/script.sh: No such file or directory.

1) I see this file in dbfs using magic "%sh" command in notebook

2) I can read from this path using a spark.read...

in docs i found

https://docs.databricks.com/dbfs/unity-catalog.html#use-dbfs-while-launching-unity-catalog-clusters-...

Databricks recommends using DBFS mounts for init scripts, configurations, and libraries stored in external storage. This behavior is not supported in shared access mode.

When i try to access this file using

abfss:// i got an error:

Failure to initialize configuration for storage account storage_name.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key.)

but i used the same credentials like in "mount credentials" in previous way.

Does init scripts have any limitations with mounted dbfs?

I am concerned about the added workspace id in the error message at the beginning of the path 

I'm using the exactly the same path which i get using this command:

dbutils.fs.ls("/mnt/storage/container/script.sh")

I assume that when calling this command, the cluster is not yet running so I cannot travel ADLS. So i should use abfss:// instead

But how to authenticate with this storage, i tried this way

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#--access-azure-data-lake-st...

using service principal in spark config but it doesnt work.

Is this storage should be public?

1 REPLY 1

User16752239289
Valued Contributor
Valued Contributor

I do not think the init script saved under mount point work and we do not suggest that.

If you specify abfss , then the cluster need to be configured so that the cluster can authenticate and access the adls gen2 folder. Otherwise, the cluster will not be able to load the init script to run during the start up

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!