cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks cluster Init scripts on ABFSS location

KJ_Saravana
New Contributor III

HI,

I have an init script which works on DBFS location during the cluster start up, but when the same shell script file is placed on ABFSS location (ADLS Gen 2 storage) I get the following init script failure error and the cluster is unable to start.

Error message: "Cluster scoped init script abfss://XX@XXXX.dfs.core.windows.net/XXXXX/RP_Test/pyodbc-install.sh failed: Failure to initialize configuration for storage account XXXX.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key"

On the cluster I've ensured that I've checked the option "Enable credential passthrough for user-level data access" and that my user account has the storage blob contributor access.

The same file which worked on DBFS didn't work on ABFSS, is there anything I should check or configure before accessing this file on cloud storage?

1 ACCEPTED SOLUTION

Accepted Solutions

alexott
Valued Contributor II
Valued Contributor II

To access init script on ADLS the Hadoop API is used so you need to provide correct Spark configuration using properties prefixed by spark.hadoop. For example, if you use service principal, you need following properties (taken from example for Terraform😞

spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure_tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id <azure_client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<azure_client_secret_secret_scope>/<azure_client_secret_secret_key>}}

View solution in original post

6 REPLIES 6

karthik_p
Esteemed Contributor

@Saravana KJ​ it looks some steps is missing in terms of key, can you please check if all steps are followed from below article and test https://www.mssqltips.com/sqlservertip/6499/reading-and-writing-data-in-azure-data-lake-storage-gen-...

alexott
Valued Contributor II
Valued Contributor II

To access init script on ADLS the Hadoop API is used so you need to provide correct Spark configuration using properties prefixed by spark.hadoop. For example, if you use service principal, you need following properties (taken from example for Terraform😞

spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure_tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id <azure_client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<azure_client_secret_secret_scope>/<azure_client_secret_secret_key>}}

KJ_Saravana
New Contributor III

Hi @Alex Ott​ 

Thanks a lot for that clarification. Now I understand that this could be done via a Terraform on cluster set up by providing the credentials like ADLS client id, tenant id and secret. Does that mean we can't set up the cluster using Databricks UI when we need to access init scripts on ABFSS location ? I can't hardcode my ADLS credentials in the Advanced Options -> Spark Config section.

alexott
Valued Contributor II
Valued Contributor II

You can setup it via UI or REST API - no problem, you just need to provide necessary Spark configuration options that I listed in the code block

Hi @Alex Ott​ thanks for helping us out. We were able to setup via UI on spark config and are able to start the cluster with the init scripts. spark.hadoop worked and in the config part we just replaced the ADLS clientID, Tenant ID and secret with the values retrieved from the secret scopes.

Anonymous
Not applicable

Hi @Saravana KJ​ 

I'm sorry you could not find a solution to your problem in the answers provided.

Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.

I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.

Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.

Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.