Databricks Community

Vadim1 · ‎06-03-2022

Hi, I'm trying to write data from RDD to the storage account:

Adding storage account key:

spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Read and write to the same storage:

val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")

Error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
	at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)

The same code works when I save dataframe (not RDD):

df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")

Looks like RDD doesn't know how to connect to the storage account by wasbs://.

Any ideas on how to fix this without a mount (dbutils.fs.mount)?

Thanks!

User16764241763 · ‎06-05-2022

Hi,

You probably need below config for RDD APIs

spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

View solution in original post

User16764241763 · ‎06-05-2022

Hi,

You probably need below config for RDD APIs

spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Vadim1 · ‎06-05-2022

Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:

spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key

TheoDeSo · ‎07-11-2023

Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration.

Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?

Kind regards

Databricks Community

Error on Azure-Databricks write RDD to storage account with wsabs://

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences