- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-03-2022 06:46 AM
Hi, I'm trying to write data from RDD to the storage account:
Adding storage account key:
spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")
Read and write to the same storage:
val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")
Error:
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)
The same code works when I save dataframe (not RDD):
df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")
Looks like RDD doesn't know how to connect to the storage account by wasbs://.
Any ideas on how to fix this without a mount (dbutils.fs.mount)?
Thanks!
- Labels:
-
Azure
-
Azure databricks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2022 08:26 PM
Hi,
You probably need below config for RDD APIs
- spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2022 08:26 PM
Hi,
You probably need below config for RDD APIs
- spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2022 10:25 PM
Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:
spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2023 01:11 AM
Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration.
Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?
Kind regards

