cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure - Databricks - account storage gen 2

db_noob
New Contributor II

Hello Every one,

i am really new to databricks, just passed my apache developer certification on it.

i also have a certification on data engineering with Azure.

some fancy words here but i only started doing real deep work on them as i started a personnal project i'm really excited about.

My issue comes with the accessing the account storage through databricks with the help of Managed identity.

meaning :

1/ created access connector to databricks

  • created it's identity and gave it delegator role on account storage + contributor on container.

2/ created a metastore, linked it to databrick access connector, linked it to my db workspace.

3/ created credentials and external location .

4 / I could Query the container with 2 different methods but not the last one.

So long i tried two ways that are working just fine

1 /

%sql
create table  raw.table;
COPY INTO raw.table
FROM 'abfss://container@accstore.dfs.core.windows.net/'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')

2 / works perfectly

%python
 
df = spark.read.schema(schema).csv("abfss://raw@twitterdatalake.dfs.core.windows.net/", header=True, escape='"',quote='"' , multiLine=True  ) #inferSchema=True

3 / doesn't work .

%sql
drop table if exists raw.table;
CREATE external TABLE raw.table USING CSV OPTIONS (path "abfss://raw@accstore.dfs.core.windows.net/", header 'true',
  inferSchema 'true') ;
FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@accsstore.dfs.core.windows.net/file.csv.
Caused by: KeyProviderException: Failure to initialize configuration for storage account twitterdatalake.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key
Caused by: InvalidConfigurationValueException: Invalid configuration value detected for fs.azure.account.key

yes i know you will ask me why do you need this particular way ?

i don't know, i juste saw it a lot in the exam certification so i guess it's a best practice ?

furtermore, the fact it doesn't work is really really annoying me.

Does anyone have an idea on why it doesn't work ?

Thank you!

Have a good day

5 REPLIES 5

etsyal1e2r3
Honored Contributor

You should try using pyspark in all of your locations to verify with

df = spark.sql("select * from <catalog.schema.table>")
df.display()

Do this after you make your managed table in your desired external location path ofcourse.

spark.sql("create schema if not exists <schema name> managed location <external location url path>")
spark.sql("create table if not exists <schema name>.<table name> managed location <external location url path>")

https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html

Kaniz
Community Manager
Community Manager

Hi @Erraji Badr​ ​​, We haven't heard from you since the last response from @Tyler Retzlaff​ ​, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Heikko
New Contributor III

Same here,

I have created a storage credential,

I have created the external location. 

I can use COPY INTOto copy data from the external location,

I can use the Load data UI from Azure.

What I cannot do is use spark.read or db.fs.ls on the external location it will fail with "Invalid configuration value detected for fs.azure.account.key" 
I mean why do I need to set the properties if it works elsewhere, isnt that the point of creating the storage credential after all?

Heikko
New Contributor III

So after attending the Office Hourse I came to realise that apparently External Locations are not the ones meant to be used like that and dont support spark. For that you need to still access the locations as per standard access and methods and the storage credential is useless.

HOWEVER, after saying that there is a new feature now in public preview called Volumes that surprise surprise, also have External Volumes. And imagine that, spark works, dbfs utils work, etc. So I guess if your use case involves easily accessing file based content from external storage this seems to be the way to go. On top of that you can use unity catalog for managing access and permissions. Case closed.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi,

If we go by the error , 

Invalid configuration value detected for fs.azure.account.key

Storage account access key to access data using the abfssprotocol cannot be used. Please refer this https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#access-adls-gen2-directly
In the above documentation the parameters are mentioned. 
Please tag @Debayan  with your next comment which will notify me. Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.