06-03-2022 06:46 AM
Hi, I'm trying to write data from RDD to the storage account:
Adding storage account key:
spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")
Read and write to the same storage:
val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")
Error:
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)
The same code works when I save dataframe (not RDD):
df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")
Looks like RDD doesn't know how to connect to the storage account by wasbs://.
Any ideas on how to fix this without a mount (dbutils.fs.mount)?
Thanks!
06-05-2022 08:26 PM
Hi,
You probably need below config for RDD APIs
06-05-2022 08:26 PM
Hi,
You probably need below config for RDD APIs
06-05-2022 10:25 PM
Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:
spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key
07-11-2023 01:11 AM
Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration.
Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?
Kind regards
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group