cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Azure - Databricks - account storage gen 2

db_noob
New Contributor II

Hello Every one,

i am really new to databricks, just passed my apache developer certification on it.

i also have a certification on data engineering with Azure.

some fancy words here but i only started doing real deep work on them as i started a personnal project i'm really excited about.

My issue comes with the accessing the account storage through databricks with the help of Managed identity.

meaning :

1/ created access connector to databricks

  • created it's identity and gave it delegator role on account storage + contributor on container.

2/ created a metastore, linked it to databrick access connector, linked it to my db workspace.

3/ created credentials and external location .

4 / I could Query the container with 2 different methods but not the last one.

So long i tried two ways that are working just fine

1 /

%sql
create table  raw.table;
COPY INTO raw.table
FROM 'abfss://container@accstore.dfs.core.windows.net/'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')

2 / works perfectly

%python
 
df = spark.read.schema(schema).csv("abfss://raw@twitterdatalake.dfs.core.windows.net/", header=True, escape='"',quote='"' , multiLine=True  ) #inferSchema=True

3 / doesn't work .

%sql
drop table if exists raw.table;
CREATE external TABLE raw.table USING CSV OPTIONS (path "abfss://raw@accstore.dfs.core.windows.net/", header 'true',
  inferSchema 'true') ;
FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@accsstore.dfs.core.windows.net/file.csv.
Caused by: KeyProviderException: Failure to initialize configuration for storage account twitterdatalake.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key
Caused by: InvalidConfigurationValueException: Invalid configuration value detected for fs.azure.account.key

yes i know you will ask me why do you need this particular way ?

i don't know, i juste saw it a lot in the exam certification so i guess it's a best practice ?

furtermore, the fact it doesn't work is really really annoying me.

Does anyone have an idea on why it doesn't work ?

Thank you!

Have a good day

4 REPLIES 4

etsyal1e2r3
Honored Contributor

You should try using pyspark in all of your locations to verify with

df = spark.sql("select * from <catalog.schema.table>")
df.display()

Do this after you make your managed table in your desired external location path ofcourse.

spark.sql("create schema if not exists <schema name> managed location <external location url path>")
spark.sql("create table if not exists <schema name>.<table name> managed location <external location url path>")

https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html

Heikko
New Contributor III

Same here,

I have created a storage credential,

I have created the external location. 

I can use COPY INTOto copy data from the external location,

I can use the Load data UI from Azure.

What I cannot do is use spark.read or db.fs.ls on the external location it will fail with "Invalid configuration value detected for fs.azure.account.key" 
I mean why do I need to set the properties if it works elsewhere, isnt that the point of creating the storage credential after all?

Heikko
New Contributor III

So after attending the Office Hourse I came to realise that apparently External Locations are not the ones meant to be used like that and dont support spark. For that you need to still access the locations as per standard access and methods and the storage credential is useless.

HOWEVER, after saying that there is a new feature now in public preview called Volumes that surprise surprise, also have External Volumes. And imagine that, spark works, dbfs utils work, etc. So I guess if your use case involves easily accessing file based content from external storage this seems to be the way to go. On top of that you can use unity catalog for managing access and permissions. Case closed.

Debayan
Databricks Employee
Databricks Employee

Hi,

If we go by the error , 

Invalid configuration value detected for fs.azure.account.key

Storage account access key to access data using the abfssprotocol cannot be used. Please refer this https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#access-adls-gen2-directly
In the above documentation the parameters are mentioned. 
Please tag @Debayan  with your next comment which will notify me. Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group