06-13-2023 07:18 AM
Hello Every one,
i am really new to databricks, just passed my apache developer certification on it.
i also have a certification on data engineering with Azure.
some fancy words here but i only started doing real deep work on them as i started a personnal project i'm really excited about.
My issue comes with the accessing the account storage through databricks with the help of Managed identity.
meaning :
1/ created access connector to databricks
2/ created a metastore, linked it to databrick access connector, linked it to my db workspace.
3/ created credentials and external location .
4 / I could Query the container with 2 different methods but not the last one.
So long i tried two ways that are working just fine
1 /
%sql
create table raw.table;
COPY INTO raw.table
FROM 'abfss://container@accstore.dfs.core.windows.net/'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')
2 / works perfectly
%python
df = spark.read.schema(schema).csv("abfss://raw@twitterdatalake.dfs.core.windows.net/", header=True, escape='"',quote='"' , multiLine=True ) #inferSchema=True
3 / doesn't work .
%sql
drop table if exists raw.table;
CREATE external TABLE raw.table USING CSV OPTIONS (path "abfss://raw@accstore.dfs.core.windows.net/", header 'true',
inferSchema 'true') ;
FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@accsstore.dfs.core.windows.net/file.csv.
Caused by: KeyProviderException: Failure to initialize configuration for storage account twitterdatalake.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key
Caused by: InvalidConfigurationValueException: Invalid configuration value detected for fs.azure.account.key
yes i know you will ask me why do you need this particular way ?
i don't know, i juste saw it a lot in the exam certification so i guess it's a best practice ?
furtermore, the fact it doesn't work is really really annoying me.
Does anyone have an idea on why it doesn't work ?
Thank you!
Have a good day
06-13-2023 07:13 PM
You should try using pyspark in all of your locations to verify with
df = spark.sql("select * from <catalog.schema.table>")
df.display()
Do this after you make your managed table in your desired external location path ofcourse.
spark.sql("create schema if not exists <schema name> managed location <external location url path>")
spark.sql("create table if not exists <schema name>.<table name> managed location <external location url path>")
https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html
06-29-2023 08:55 AM
Same here,
I have created a storage credential,
I have created the external location.
I can use COPY INTOto copy data from the external location,
I can use the Load data UI from Azure.
What I cannot do is use spark.read or db.fs.ls on the external location it will fail with "Invalid configuration value detected for fs.azure.account.key"
I mean why do I need to set the properties if it works elsewhere, isnt that the point of creating the storage credential after all?
07-25-2023 05:12 AM - edited 07-25-2023 05:13 AM
So after attending the Office Hourse I came to realise that apparently External Locations are not the ones meant to be used like that and dont support spark. For that you need to still access the locations as per standard access and methods and the storage credential is useless.
HOWEVER, after saying that there is a new feature now in public preview called Volumes that surprise surprise, also have External Volumes. And imagine that, spark works, dbfs utils work, etc. So I guess if your use case involves easily accessing file based content from external storage this seems to be the way to go. On top of that you can use unity catalog for managing access and permissions. Case closed.
07-26-2023 12:42 AM
Hi,
If we go by the error ,
Invalid configuration value detected for fs.azure.account.key
Storage account access key to access data using the abfssprotocol cannot be used. Please refer this https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#access-adls-gen2-directly
In the above documentation the parameters are mentioned.
Please tag @Debayan with your next comment which will notify me. Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group