03-20-2023 12:44 AM
HI,
I have an init script which works on DBFS location during the cluster start up, but when the same shell script file is placed on ABFSS location (ADLS Gen 2 storage) I get the following init script failure error and the cluster is unable to start.
Error message: "Cluster scoped init script abfss://XX@XXXX.dfs.core.windows.net/XXXXX/RP_Test/pyodbc-install.sh failed: Failure to initialize configuration for storage account XXXX.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key"
On the cluster I've ensured that I've checked the option "Enable credential passthrough for user-level data access" and that my user account has the storage blob contributor access.
The same file which worked on DBFS didn't work on ABFSS, is there anything I should check or configure before accessing this file on cloud storage?
03-23-2023 12:59 PM
To access init script on ADLS the Hadoop API is used so you need to provide correct Spark configuration using properties prefixed by spark.hadoop. For example, if you use service principal, you need following properties (taken from example for Terraform😞
spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure_tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id <azure_client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<azure_client_secret_secret_scope>/<azure_client_secret_secret_key>}}
03-22-2023 09:44 AM
@Saravana KJ it looks some steps is missing in terms of key, can you please check if all steps are followed from below article and test https://www.mssqltips.com/sqlservertip/6499/reading-and-writing-data-in-azure-data-lake-storage-gen-...
03-23-2023 12:59 PM
To access init script on ADLS the Hadoop API is used so you need to provide correct Spark configuration using properties prefixed by spark.hadoop. For example, if you use service principal, you need following properties (taken from example for Terraform😞
spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure_tenant_id>/oauth2/token
spark.hadoop.fs.azure.account.oauth2.client.id <azure_client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets/<azure_client_secret_secret_scope>/<azure_client_secret_secret_key>}}
03-24-2023 10:16 AM
Hi @Alex Ott
Thanks a lot for that clarification. Now I understand that this could be done via a Terraform on cluster set up by providing the credentials like ADLS client id, tenant id and secret. Does that mean we can't set up the cluster using Databricks UI when we need to access init scripts on ABFSS location ? I can't hardcode my ADLS credentials in the Advanced Options -> Spark Config section.
03-24-2023 11:16 AM
You can setup it via UI or REST API - no problem, you just need to provide necessary Spark configuration options that I listed in the code block
03-27-2023 01:59 AM
Hi @Alex Ott thanks for helping us out. We were able to setup via UI on spark config and are able to start the cluster with the init scripts. spark.hadoop worked and in the config part we just replaced the ADLS clientID, Tenant ID and secret with the values retrieved from the secret scopes.
03-26-2023 10:05 PM
Hi @Saravana KJ
I'm sorry you could not find a solution to your problem in the answers provided.
Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.
I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.
Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.
Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group