Databricks Community

db_noob · ‎06-13-2023

Hello Every one,

i am really new to databricks, just passed my apache developer certification on it.

i also have a certification on data engineering with Azure.

some fancy words here but i only started doing real deep work on them as i started a personnal project i'm really excited about.

My issue comes with the accessing the account storage through databricks with the help of Managed identity.

meaning :

1/ created access connector to databricks

created it's identity and gave it delegator role on account storage + contributor on container.

2/ created a metastore, linked it to databrick access connector, linked it to my db workspace.

3/ created credentials and external location .

4 / I could Query the container with 2 different methods but not the last one.

So long i tried two ways that are working just fine

1 /

%sql
create table  raw.table;
COPY INTO raw.table
FROM 'abfss://container@accstore.dfs.core.windows.net/'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')

2 / works perfectly

%python
 
df = spark.read.schema(schema).csv("abfss://raw@twitterdatalake.dfs.core.windows.net/", header=True, escape='"',quote='"' , multiLine=True  ) #inferSchema=True

3 / doesn't work .

%sql
drop table if exists raw.table;
CREATE external TABLE raw.table USING CSV OPTIONS (path "abfss://raw@accstore.dfs.core.windows.net/", header 'true',
  inferSchema 'true') ;

FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@accsstore.dfs.core.windows.net/file.csv.
Caused by: KeyProviderException: Failure to initialize configuration for storage account twitterdatalake.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key
Caused by: InvalidConfigurationValueException: Invalid configuration value detected for fs.azure.account.key

yes i know you will ask me why do you need this particular way ?

i don't know, i juste saw it a lot in the exam certification so i guess it's a best practice ?

furtermore, the fact it doesn't work is really really annoying me.

Does anyone have an idea on why it doesn't work ?

Thank you!

Have a good day

etsyal1e2r3 · ‎06-13-2023

You should try using pyspark in all of your locations to verify with

df = spark.sql("select * from <catalog.schema.table>")
df.display()

Do this after you make your managed table in your desired external location path ofcourse.

spark.sql("create schema if not exists <schema name> managed location <external location url path>")
spark.sql("create table if not exists <schema name>.<table name> managed location <external location url path>")

https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html

Heikko · ‎06-29-2023

Same here,

I have created a storage credential,

I have created the external location.

I can use COPY INTOto copy data from the external location,

I can use the Load data UI from Azure.

What I cannot do is use spark.read or db.fs.ls on the external location it will fail with "Invalid configuration value detected for fs.azure.account.key"
I mean why do I need to set the properties if it works elsewhere, isnt that the point of creating the storage credential after all?

Heikko · ‎07-25-2023

So after attending the Office Hourse I came to realise that apparently External Locations are not the ones meant to be used like that and dont support spark. For that you need to still access the locations as per standard access and methods and the storage credential is useless.

HOWEVER, after saying that there is a new feature now in public preview called Volumes that surprise surprise, also have External Volumes. And imagine that, spark works, dbfs utils work, etc. So I guess if your use case involves easily accessing file based content from external storage this seems to be the way to go. On top of that you can use unity catalog for managing access and permissions. Case closed.

Debayan · ‎07-26-2023

Hi,

If we go by the error ,

Invalid configuration value detected for fs.azure.account.key

Storage account access key to access data using the abfssprotocol cannot be used. Please refer this https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#access-adls-gen2-directly
In the above documentation the parameters are mentioned.
Please tag @Debayan with your next comment which will notify me. Thanks!

Databricks Community

Azure - Databricks - account storage gen 2

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!