cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Error on Azure-Databricks write RDD to storage account with wsabs://

Vadim1
New Contributor III

Hi, I'm trying to write data from RDD to the storage account:

Adding storage account key:

spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Read and write to the same storage:

val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")

Error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
	at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)

The same code works when I save dataframe (not RDD):

df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")

Looks like RDD doesn't know how to connect to the storage account by wasbs://.

Any ideas on how to fix this without a mount (dbutils.fs.mount)?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

User16764241763
Honored Contributor

Hi,

You probably need below config for RDD APIs

  1. spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

View solution in original post

4 REPLIES 4

User16764241763
Honored Contributor

Hi,

You probably need below config for RDD APIs

  1. spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Vadim1
New Contributor III

Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:

spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key

Kaniz
Community Manager
Community Manager

Hi @Vadim Z​, I'm glad it worked. Would you like to mark your answer as the best?

TheoDeSo
New Contributor III

Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration. 

Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?

Kind regards

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.