cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error on Azure-Databricks write RDD to storage account with wsabs://

Vadim1
New Contributor III

Hi, I'm trying to write data from RDD to the storage account:

Adding storage account key:

spark.conf.set("fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Read and write to the same storage:

val path = "wasbs://x@y.blob.core.windows.net/data/x.csv"
val df = spark.read.format("csv").load(path)
df.rdd.saveAsObjectFile("wasbs://x@y.blob.core.windows.net/out/out.csv")

Error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container x in account y.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1037)
	at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:488)
	at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1325)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:603)

The same code works when I save dataframe (not RDD):

df.write.csv("wasbs://x@y.blob.core.windows.net/out/obj.csv")

Looks like RDD doesn't know how to connect to the storage account by wasbs://.

Any ideas on how to fix this without a mount (dbutils.fs.mount)?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

User16764241763
Honored Contributor

Hi,

You probably need below config for RDD APIs

  1. spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

View solution in original post

3 REPLIES 3

User16764241763
Honored Contributor

Hi,

You probably need below config for RDD APIs

  1. spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey")

Vadim1
New Contributor III

Hi, thanks a lot aravish! This didn't work from a notebook but worked when I added it in Advanced options to spark config of the cluster:

spark.hadoop.fs.azure.account.key.y.blob.core.windows.net key

TheoDeSo
New Contributor III

Hello @Vadim1 and @User16764241763. I'm wondering if you find a way to avoid adding the hardcoded key in the advanced options spark config section in the cluster configuration. 

Is there a similar command to spark.conf.set("spark.hadoop.fs.azure.account.key.y.blob.core.windows.net", "myStorageAccountKey") that works on the notebook level after getting the key from the secret scope ?

Kind regards

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group