cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Create a Hive db in Azure Databricks with the LOCATION parameter set to an ADLS GEN2 account fails

Mihai_Cog
Contributor

Hello,

I am trying to create a database in Azure Databricks using the abfss Location in the create database statement and it throws an exception.

%sql

CREATE DATABASE IF NOT EXISTS test COMMENT "Database for Test Area" LOCATION "abfss://test@storagetemp.dfs.core.windows.net/database/"

The error from creating the database is:

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration for storage account storagetemp.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key)

Using a notebook with these settings and running it, is not helping. Any ideea?

I added:

service_credential = dbutils.secrets.get(scope="<secret-scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")

spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")

spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)

spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

in a notebook and after that I was running this notebook of the beginning of the MAIN notebook, using magic command:

%run "./conn"

I mention that I CAN do .save(), I can do dbutils.fs.ls(location).

BUT, if I add in the Cluster Spark Config:

fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth

fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>

fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}

fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token

IT IS WORKING.

What I am doing wrong?

Thank you.

14 REPLIES 14

-werners-
Esteemed Contributor III

is your workspace Unity enabled by any chance? Because Unity Catalog ignores Spark configuration settings when accessing data managed by external locations.

Nope. Is not.

-werners-
Esteemed Contributor III

strange.  I don't see the issue.  Perhaps a typo somewhere?

Can you test with putting the config directly in the notebook (instead of using %run)?

Done and is not working.

Only with configuration in Cluster Spark Config is working.

I really don't understand.

-werners-
Esteemed Contributor III

According to the docs it should work:

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#azureserviceprincipal

You are using a Service Principal I suppose?

Yes, Service Principal.

 

-werners-
Esteemed Contributor III

The only things that come to mind are either:

- a typo/wrong value passed

- permission (so the spark conf is not actually updated)
Because it should work.

I checked the notebook and what I have in spark config - NO typo

If is working when I set those in Spark Config, it should work also from notebook. 

The only thing that is not working with these settings in notebook, is the CREATE DATABASE. Other things like .save(). dbutils.fs.ls(), .write() are working.

It is something else.

-werners-
Esteemed Contributor III

can you, for test purposes mount the external location to dbfs?
and then as path you use /mnt/<mount>/...

Already did, works properly.

But, I cannot use anymore the mounting, because Databricks announced that mounting is not recommended to be used because soon will be deprecated..

-werners-
Esteemed Contributor III

correct (mainly for Unity though).
Hey how about you try with CREATE SCHEMA instead of DATABASE?  It should not make any difference, but since we are stuck anyway...

Tried, same error message.

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration for storage account storagetemp.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key)

-werners-
Esteemed Contributor III

ok.

Can you check this link?

Because I suspect something is wrong, and so dbrx uses the account key as a default (fs.azure.account.key)

Thank you for the link, is very useful, although not for me.

Everthing is set as it should be.

Still not working.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group