cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hive metastore CANNOT access storage

cobba16
New Contributor II

I'm new to Azure Databricks and I'm facing an issue when trying to create a schema or table that points to my Azure Storage account. I keep getting this error:

```
[EXTERNAL_METASTORE_CLIENT_ERROR.OPERATION_FAILED] Client operation failed: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD, , https://s<storage_acc_name>dfs.core.windows.net/data/?upn=false&action=getAccessControl&timeout=90) SQLSTATE: 58000
```

Here's what I've done so far:

  1. I can successfully list files in my storage using dbutils.fs.ls() with my storage account key

  2. I've tried granting access control roles in my storage account

  3. But when I run CREATE SCHEMA with a LOCATION pointing to my storage, it fails


# This works:
dbutils.fs.ls("abfss://data@<storage_acc_name>.dfs.core.windows.net/")
# Returns: [FileInfo(path='abfss://.../hotel-weather/', ...)]

# This fails:
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze LOCATION 'abfss://data@<storage_acc_name>.dfs.core.windows.net/bronze'")

It seems like Databricks can read from my storage, but cannot write/create schemas there. Has anyone faced this before? What permissions am I missing?

What I've tried:

  • Added Storage Blob Data Contributor role to various identities

  • Verified my storage key is correct (since listing works)

2 REPLIES 2

Commitchell
Databricks Employee
Databricks Employee

Hi Cobba16,

A couple thoughts for you.

  1. It looks like you're setting the credentials in a spark config from within a Notebook. That's why operations execute in your session context, but when trying to create the Schema, the Hive Metastore is trying to authenticate in its own context. 
  2. Is there a specific reason you're using HMS? The Hive Metastore is no longer the best practice, you should instead try using Unity Catalog. I think you'll find it more straight forward.

Checkout the documentation here:

https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-storage/
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-external-locations

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @cobba16,

This is a common scenario on Azure Databricks when using the legacy hive_metastore with external storage. The root cause is that dbutils.fs.ls and Spark read operations use the credentials you set in the Spark session config, but the Hive metastore uses a separate authentication path when it needs to validate a LOCATION during CREATE SCHEMA or CREATE TABLE operations. That separate path is what is returning the 403.

Here is how this happens and how to fix it.

WHY DBUTILS.FS.LS WORKS BUT CREATE SCHEMA FAILS

When you run dbutils.fs.ls("abfss://..."), Databricks uses the Hadoop filesystem configuration you set in the Spark session (e.g., the storage account access key via fs.azure.account.key.<account>.dfs.core.windows.net). This goes directly from the cluster to Azure Storage.

When you run CREATE SCHEMA ... LOCATION 'abfss://...', the Hive metastore needs to verify and record that location. The metastore itself makes a HEAD request to the storage path to validate it exists. This server-side call uses a different credential chain than your Spark session config, which is why you get the 403 "Server failed to authenticate the request" error even though direct reads work fine.

HOW TO FIX THIS

There are a few approaches depending on your setup:

OPTION 1: SET CREDENTIALS IN THE CLUSTER SPARK CONFIG (NOT JUST THE NOTEBOOK)

If you are setting the storage key only in your notebook code, the Hive metastore may not pick it up for its validation call. Instead, set the credentials in the cluster's Spark configuration (under Compute > your cluster > Advanced options > Spark config):

spark.hadoop.fs.azure.account.key.<storage_account>.dfs.core.windows.net {{secrets/<scope>/<key>}}

This ensures the credentials are available to both Spark and the Hive metastore from cluster startup.

OPTION 2: USE AN INIT SCRIPT OR CLUSTER-SCOPED HADOOP CONFIG

You can also set the Hadoop configuration at the cluster level using the cluster's Hadoop configuration section or an init script. This ensures the credentials are propagated before any metastore operations occur.

OPTION 3: MIGRATE TO UNITY CATALOG (RECOMMENDED)

The recommended long-term approach is to use Unity Catalog with external locations and storage credentials instead of the legacy hive_metastore. Unity Catalog provides a centralized way to manage storage access using Azure managed identities or service principals, and it handles authentication consistently for all operations.

With Unity Catalog you would:
1. Create an Access Connector for Azure Databricks in Azure Portal
2. Assign the Storage Blob Data Contributor role on your ADLS Gen2 account to the managed identity
3. Create a storage credential in Unity Catalog referencing the access connector
4. Create an external location in Unity Catalog pointing to your storage path
5. Create schemas and tables under a Unity Catalog catalog that references the external location

Documentation: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/azure-managed...

VERIFYING YOUR CURRENT SETUP

To help narrow this down, you can check what credentials the Hive metastore is actually using by running:

spark.conf.get("spark.hadoop.fs.azure.account.key.<storage_account>.dfs.core.windows.net")

If this returns empty or throws an error, the credential is not available at the cluster level, which confirms the issue.

You can also check if the storage key is set only at the notebook level vs. the cluster level by looking at the cluster's Spark configuration page in the Compute UI.

ADDITIONAL NOTES

- If you are using a SAS token instead of an account key, make sure the SAS token has the correct permissions (read, write, delete, list, create) and that it is set in the cluster Spark config, not just in notebook code.
- If a storage firewall is enabled on the ADLS Gen2 account, make sure the Databricks workspace VNet/subnets are allowlisted. Firewall rules can cause 403 errors that look like authentication failures.
- If you recently rotated storage keys, make sure the new key is updated in your Databricks secret scope and that the cluster has been restarted to pick up the change.

Documentation references:
- Access Azure Data Lake Storage using Azure credentials: https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage
- Hive metastore configuration: https://learn.microsoft.com/en-us/azure/databricks/archive/external-metastores/external-hive-metasto...
- Unity Catalog external locations: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.