Databricks

repcak · ‎03-13-2023

I'm trying to access init script which is stored on mounted azure data lake storage gen2 to dbfs

I mounted storage to

dbfs:/mnt/storage/container/script.sh

and when i try to access it

i got an error:

Cluster scoped init script dbfs:/mnt/storage/container/script.sh failed: Timed out with exception after 5 attempts (debugStr = 'Reading remote file for init script'), Caused by: java.io.FileNotFoundException: /WORKSPACE_ID/mnt/storage/container/script.sh: No such file or directory.

1) I see this file in dbfs using magic "%sh" command in notebook

2) I can read from this path using a spark.read...

in docs i found

https://docs.databricks.com/dbfs/unity-catalog.html#use-dbfs-while-launching-unity-catalog-clusters-...

Databricks recommends using DBFS mounts for init scripts, configurations, and libraries stored in external storage. This behavior is not supported in shared access mode.

When i try to access this file using

abfss:// i got an error:

Failure to initialize configuration for storage account storage_name.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key.)

but i used the same credentials like in "mount credentials" in previous way.

Does init scripts have any limitations with mounted dbfs?

I am concerned about the added workspace id in the error message at the beginning of the path

I'm using the exactly the same path which i get using this command:

dbutils.fs.ls("/mnt/storage/container/script.sh")

I assume that when calling this command, the cluster is not yet running so I cannot travel ADLS. So i should use abfss:// instead

But how to authenticate with this storage, i tried this way

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#--access-azure-data-lake-st...

using service principal in spark config but it doesnt work.

Is this storage should be public?

User16752239289 · ‎03-22-2023

I do not think the init script saved under mount point work and we do not suggest that.

If you specify abfss , then the cluster need to be configured so that the cluster can authenticate and access the adls gen2 folder. Otherwise, the cluster will not be able to load the init script to run during the start up

Databricks

Init Scripts with mounted azure data lake storage gen2

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI