cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

pavlos_skev
New Contributor III

Hello,

We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.

We have a pipeline that gets XML data and stores it in an RDD. The code then attempts to save the RDD, which causes the error: Invalid configuration value detected for fs.azure.account.key.

The weird thing is that this only happens when attempting to persist the RDD, but when attempting to persist a DataFrame, there is no issue.

 

path = 'abfss://......'
df = dummyDataframe()
# rdd_processed.collect() -> ['xml content 1', 'xml content 2', ..., 'xml content n']

df.write.text(path+'/staging1/') # works like a charm, can view saved files in SA
rdd_processed.saveAsTextFile(path+'/staging2/') # returns the error

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

pavlos_skev
New Contributor III

I will post here what worked resolving this error for us, in case someone else in the future encounters this.

It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avoid this error, the 'staging2' directory has to be deleted before attempting to use 'saveAsTextFile'.

rdd_processed.saveAsTextFile(path+'/staging2/')

The weird thing is that we were already doing that, but we would still get the error. We had a notebook cell that deletes the path ''path+'/staging2/'", and then on the next cell, the above command would run, giving the error.

It turns out, to address this, the delete command of the path has to be on the same exact cell as the 'saveAsText' line of code. When we put the code in the same cell, the error wouldn't show anymore and the saving of the rdd_processed was successful. This is definitely a bug as it doesn't make sense why it works, but for now at least there's a solution.

View solution in original post

2 REPLIES 2

Hello Kaniz,

Something that I forgot to mention in the OP, is that we are using Unity Catalog volumes to connect to the storage account, that are tested and work properly. I found the solution which seems like a major bug, unless I am missing something. I will post my answer in another comment. Thanks for reaching out.

pavlos_skev
New Contributor III

I will post here what worked resolving this error for us, in case someone else in the future encounters this.

It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avoid this error, the 'staging2' directory has to be deleted before attempting to use 'saveAsTextFile'.

rdd_processed.saveAsTextFile(path+'/staging2/')

The weird thing is that we were already doing that, but we would still get the error. We had a notebook cell that deletes the path ''path+'/staging2/'", and then on the next cell, the above command would run, giving the error.

It turns out, to address this, the delete command of the path has to be on the same exact cell as the 'saveAsText' line of code. When we put the code in the same cell, the error wouldn't show anymore and the saving of the rdd_processed was successful. This is definitely a bug as it doesn't make sense why it works, but for now at least there's a solution.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group