- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2023 05:18 AM
Hello,
We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.
We have a pipeline that gets XML data and stores it in an RDD. The code then attempts to save the RDD, which causes the error: Invalid configuration value detected for fs.azure.account.key.
The weird thing is that this only happens when attempting to persist the RDD, but when attempting to persist a DataFrame, there is no issue.
path = 'abfss://......'
df = dummyDataframe()
# rdd_processed.collect() -> ['xml content 1', 'xml content 2', ..., 'xml content n']
df.write.text(path+'/staging1/') # works like a charm, can view saved files in SA
rdd_processed.saveAsTextFile(path+'/staging2/') # returns the error
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2023 05:42 AM
I will post here what worked resolving this error for us, in case someone else in the future encounters this.
It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avoid this error, the 'staging2' directory has to be deleted before attempting to use 'saveAsTextFile'.
rdd_processed.saveAsTextFile(path+'/staging2/')
The weird thing is that we were already doing that, but we would still get the error. We had a notebook cell that deletes the path ''path+'/staging2/'", and then on the next cell, the above command would run, giving the error.
It turns out, to address this, the delete command of the path has to be on the same exact cell as the 'saveAsText' line of code. When we put the code in the same cell, the error wouldn't show anymore and the saving of the rdd_processed was successful. This is definitely a bug as it doesn't make sense why it works, but for now at least there's a solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2023 02:40 AM
Hello Kaniz,
Something that I forgot to mention in the OP, is that we are using Unity Catalog volumes to connect to the storage account, that are tested and work properly. I found the solution which seems like a major bug, unless I am missing something. I will post my answer in another comment. Thanks for reaching out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2023 05:42 AM
I will post here what worked resolving this error for us, in case someone else in the future encounters this.
It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avoid this error, the 'staging2' directory has to be deleted before attempting to use 'saveAsTextFile'.
rdd_processed.saveAsTextFile(path+'/staging2/')
The weird thing is that we were already doing that, but we would still get the error. We had a notebook cell that deletes the path ''path+'/staging2/'", and then on the next cell, the above command would run, giving the error.
It turns out, to address this, the delete command of the path has to be on the same exact cell as the 'saveAsText' line of code. When we put the code in the same cell, the error wouldn't show anymore and the saving of the rdd_processed was successful. This is definitely a bug as it doesn't make sense why it works, but for now at least there's a solution.

