cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error on Azure-Databricks write output to blob storage account

TheoDeSo
New Contributor III

Hello,

After implementing the use of Secret Scope to store Secrets in an azure key vault, i faced a problem.

When writting an output to the blob i get the following error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container analysis in account [REDACTED].blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration.

After some investigation, it is related to the following config previously set in the advanced configuration of the cluster configuration:

"spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey"

I would like to find the way to set this in the notebook level after retrieving the secret from the secret scope:

spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Unfortunatly this does not work.

Here below an example of how i write the output:

df.write.format("com.crealytics.spark.excel") \
  .option("dataAddress", "'%s'!A1" %(sheetName)) \
  .option("header", "true") \
  .option("dateFormat", "yy-mm-d") \
  .option("timestampFormat", "mm-dd-yyyy hh:mm:ss") \
  .option("useHeader", "true") \
  .mode("append")  \
  .save( "%s/%s" %(output_blob_folder,outputName))

 

1 ACCEPTED SOLUTION

Accepted Solutions

TheoDeSo
New Contributor III

Hi all thank you for the suggestions. 

Doing This 

spark.conf.set("fs.azure.account.key.{storage_account}.dfs.core.windows.net", "{myStorageAccountKey}")

For the hadoop configuration does not work.

And the suggestion of @Tharun-Kumar would suggest to hardcode secrets in the configuration which is a big no.

Someone else suggested the proper solution on stack overflow which is to add in the same location @Tharun-Kumar suggested to add this, but pointing at the secret scope at the same time:

spark.hadoop.fs.azure.account.key.<account_name>.blob.core.windows.net {{secrets/<secret-scope-name>/<secret-name>}}

View solution in original post

8 REPLIES 8

Tharun-Kumar
Databricks Employee
Databricks Employee

@TheoDeSo 

You need to edit the Spark Config by entering the connection information for your Azure Storage account. 

Enter the following:
spark.hadoop.fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net <ACCESS_KEY>

where <STORAGE_ACCOUNT_NAME> is your Azure Storage account name, and <ACCESS_KEY> is your storage access key.

You need to include this in your spark configs and restart the cluster to overcome this issue.

Hello unfortunatly this is not the desired solution as this involves hardcoding the secret in the configuration of the cluster. I posted the question on stack overflow https://stackoverflow.com/questions/76655569/databricks-not-allowing-to-write-output-to-folder-using... And got the desired answer.

In the Cluster Configuration i wrote the following:

spark.hadoop.fs.azure.account.key.<account_name>.blob.core.windows.net {{secrets/<secret-scope-name>/<secret-name>}}

Prabakar
Databricks Employee
Databricks Employee

Please refer to the doc Connect to Azure Data Lake Storage Gen2 and Blob Storage - Azure Databricks | Microsoft Learn. This has the command that you can use to set the spark config from the notebook level.

service_credential = dbutils.secrets.get(scope="<secret-scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Replace

  • <secret-scope> with the Databricks secret scope name.
  • <service-credential-key> with the name of the key containing the client's secret.
  • <storage-account> with the name of the Azure storage account.
  • <application-id> with the Application (client) ID for the Azure Active Directory application.
  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.

 

Hemant
Valued Contributor II

@Prabakar  you are using the service principle here.

Hemant Soni

Hemant
Valued Contributor II

Hello @TheoDeSo , just simply rewrite the configuration:

 

spark.conf.set("fs.azure.account.key.{storage_account}.dfs.core.windows.net", "{myStorageAccountKey}")

 

use this uri to access the storage account: abfss://{container_name}@{storage_account}.dfs.core.windows.net/

You can check using: dbutils.fs.ls(" abfss://{container_name}@{storage_account}.dfs.core.windows.net/")

Hemant Soni

Anonymous
Not applicable

Hi @TheoDeSo 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

 

TheoDeSo
New Contributor III

Hi all thank you for the suggestions. 

Doing This 

spark.conf.set("fs.azure.account.key.{storage_account}.dfs.core.windows.net", "{myStorageAccountKey}")

For the hadoop configuration does not work.

And the suggestion of @Tharun-Kumar would suggest to hardcode secrets in the configuration which is a big no.

Someone else suggested the proper solution on stack overflow which is to add in the same location @Tharun-Kumar suggested to add this, but pointing at the secret scope at the same time:

spark.hadoop.fs.azure.account.key.<account_name>.blob.core.windows.net {{secrets/<secret-scope-name>/<secret-name>}}

nguyenthuymo
New Contributor II

Hi all,

Is it correct that Azure-Databricks only support to write data to Azure Data Lake Gen2 and does not support for Azure Storage Blob (StorageV2 - general purpose) ?

In my case, I can read the data from Azure Storage Blob (StorageV2 - general purpose v2) to Databricks, but when writing data back from Databricks to that Azure blob, it shows error.

Here is my code

# Define the path where you want to write the table
output_path = "wasbs://powerbiinfo@dlsupeducationdev.blob.core.windows.net/staging/test"
 
# Write the DataFrame to the specified path in JSON format
try:
    df.write.mode("overwrite").json(output_path)
    print("Write operation successful.")
except Exception as e:
    print(f"Error writing to Azure Blob Storage: {e}")
 
And error: 
Error writing to Azure Blob Storage: An error occurred while calling o451.json. : shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string. at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1217)...."

nguyenthuymo_0-1732072021647.png
Any idea, please help!

Thanks
Mo

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group