Hi, I'm trying to write data from RDD to the storage account:
Adding storage account key:
spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")
Read and write to the same storage:
val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")
Error:
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)
The same code works when I save dataframe (not RDD):
df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")
Looks like RDD doesn't know how to connect to the storage account by wasbs://.
Any ideas on how to fix this without a mount (dbutils.fs.mount)?
Thanks!