Databricks

Vadim1 · ‎06-03-2022

Hi, I'm trying to write data from RDD to the storage account:

Adding storage account key:

spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Read and write to the same storage:

val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")

Error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
	at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)

The same code works when I save dataframe (not RDD):

df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")

Looks like RDD doesn't know how to connect to the storage account by wasbs://.

Any ideas on how to fix this without a mount (dbutils.fs.mount)?

Thanks!

User16764241763 · ‎06-05-2022

Hi,

You probably need below config for RDD APIs

spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

View solution in original post

User16764241763 · ‎06-05-2022

Hi,

You probably need below config for RDD APIs

spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Vadim1 · ‎06-05-2022

Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:

spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key

Kaniz · ‎06-09-2022

Hi @Vadim Z, I'm glad it worked. Would you like to mark your answer as the best?

TheoDeSo · ‎07-11-2023

Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration.

Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?

Kind regards

Databricks

Error on Azure-Databricks write RDD to storage account with wsabs://

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI