Hi @ajay_wavicle,
Good timing on this question. Connecting Azure storage accounts to Databricks using a User-Assigned Managed Identity is a great approach -- it avoids the need to manage secrets and supports storage firewall configurations. Here is a complete walkthrough covering the Azure Portal setup and the Databricks side.
OVERVIEW
In Azure Databricks (with Unity Catalog), you connect to Azure Data Lake Storage Gen2 through three key objects:
1. Access Connector for Azure Databricks -- an Azure resource that holds a managed identity
2. Storage Credential -- a Unity Catalog object that references the access connector
3. External Location -- a Unity Catalog object that maps a storage path (abfss://) to a storage credential
You can use either a system-assigned or user-assigned managed identity. Since you want a user-assigned managed identity, that gives you more control because you create and manage it independently, and you can reuse the same identity across multiple access connectors.
PREREQUISITES
- An Azure Data Lake Storage Gen2 account (must have hierarchical namespace enabled)
- Contributor or Owner role on an Azure resource group (to create the access connector)
- Owner or User Access Administrator role on the storage account (to assign IAM roles)
- A Databricks workspace enabled for Unity Catalog
- CREATE STORAGE CREDENTIAL privilege on your Unity Catalog metastore (account admins and metastore admins have this by default)
STEP 1: CREATE A USER-ASSIGNED MANAGED IDENTITY IN AZURE
If you do not already have one:
1. In the Azure Portal, search for "Managed Identities" and click Create.
2. Select your subscription, resource group, and region. The region should match your storage account region for best performance.
3. Give it a meaningful name (e.g., "databricks-storage-identity").
4. Click Review + Create, then Create.
5. Once created, go to the resource and copy the Resource ID. It will look like:
/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>
STEP 2: CREATE AN ACCESS CONNECTOR FOR AZURE DATABRICKS
The Access Connector is a first-party Azure resource that lets you connect managed identities to an Azure Databricks account.
1. In the Azure Portal, click "+ Create a resource".
2. Search for "Access Connector for Azure Databricks" and select it.
3. Click Create.
4. Fill in the Basics tab:
- Subscription: your Azure subscription
- Resource Group: select an appropriate resource group
- Name: a descriptive name (e.g., "my-databricks-access-connector")
- Region: same region as your storage account
5. Click Next to the Managed Identity tab.
6. Under User-assigned managed identity, click "+ Add" and select the managed identity you created in Step 1.
7. Click Review + Create, then Create.
8. Once deployed, go to the resource and copy the Resource ID. It will look like:
/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
STEP 3: GRANT THE MANAGED IDENTITY ACCESS TO YOUR STORAGE ACCOUNT
1. In the Azure Portal, navigate to your Azure Data Lake Storage Gen2 account.
2. Go to Access Control (IAM) and click "+ Add" then "Add role assignment".
3. Select the "Storage Blob Data Contributor" role (this grants read and write access). Click Next.
4. Under "Assign access to", select "Managed identity".
5. Click "+ Select members".
6. In the managed identity dropdown, select "User-assigned managed identity".
7. Search for and select your managed identity from Step 1.
8. Click Select, then Review + Assign.
Note: If you only need read access, you can use "Storage Blob Data Reader" instead. For finer-grained control, you can assign "Storage Blob Delegator" at the storage account level and "Storage Blob Data Contributor" at a specific container level.
STEP 4: CREATE A STORAGE CREDENTIAL IN DATABRICKS
Option A -- Using the Databricks UI (Catalog Explorer):
1. Log into your Databricks workspace.
2. Click the "Catalog" icon in the sidebar.
3. Click "External data" then go to the "Credentials" tab.
4. Click "Create credential" and select "Storage credential".
5. Set Credential Type to "Azure Managed Identity".
6. Enter a name for your credential.
7. In the Access Connector ID field, paste the access connector resource ID from Step 2.
8. In the "User-assigned managed identity ID" field, paste the managed identity resource ID from Step 1.
9. Click Create.
Option B -- Using SQL in a notebook:
CREATE STORAGE CREDENTIAL my_storage_cred
WITH (
AZURE_MANAGED_IDENTITY (
ACCESS_CONNECTOR_ID = '/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>',
MANAGED_IDENTITY_ID = '/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>'
)
);
Option C -- Using the Databricks CLI:
databricks storage-credentials create --json '{
"name": "my_storage_cred",
"azure_managed_identity": {
"access_connector_id": "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Databricks/accessConnectors/<connector-name>",
"managed_identity_id": "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>"
}
}'
STEP 5: CREATE AN EXTERNAL LOCATION
An external location maps a specific storage path to your storage credential so Unity Catalog can govern access.
Option A -- Using the UI:
1. In Catalog Explorer, click "External data" then the "External Locations" tab.
2. Click "Create location".
3. Set Storage type to "Azure Data Lake Storage".
4. In the URL field, enter your container path in abfss format:
abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<optional-path>
5. Select the storage credential you created in Step 4.
6. Click Create.
Option B -- Using SQL:
CREATE EXTERNAL LOCATION my_ext_location
URL 'abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>'
WITH (STORAGE CREDENTIAL my_storage_cred);
After creating the external location, grant appropriate permissions to users:
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION my_ext_location TO `user_or_group`;
-- Or, to allow creating external tables:
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION my_ext_location TO `user_or_group`;
STEP 6: VALIDATE YOUR SETUP
You can validate access by listing files:
LIST 'abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>';
Or by reading a file:
SELECT * FROM read_files('abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>/myfile.csv');
REGARDING AZURE CLI
You mentioned wanting to use Azure CLI. The Azure CLI is useful for the Azure-side setup (Steps 1-3). Here are the key commands:
Create the user-assigned managed identity:
az identity create \
--name databricks-storage-identity \
--resource-group <resource-group> \
--location <region>
Create the access connector with user-assigned identity:
az databricks access-connector create \
--name my-databricks-access-connector \
--resource-group <resource-group> \
--location <region> \
--identity-type UserAssigned \
--user-assigned-identities '{"/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>": {}}'
Assign the Storage Blob Data Contributor role:
az role assignment create \
--assignee-object-id $(az identity show --name <identity-name> --resource-group <rg> --query principalId -o tsv) \
--assignee-principal-type ServicePrincipal \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>
For the Databricks side (Steps 4-6), you can use the Databricks CLI as shown above, or the SQL commands from a notebook.
IMPORTANT NOTES
- Your Azure Data Lake Storage Gen2 account MUST have hierarchical namespace enabled. Standard Blob Storage accounts will not work with Unity Catalog.
- For best performance, keep the access connector, managed identity, storage account, and Databricks workspace in the same Azure region.
- Managed identities are strongly recommended over service principals because they do not require secret rotation, and they support storage firewall configurations (network-restricted storage accounts).
- If your storage account has a firewall enabled, you can configure trusted access by adding the access connector as a resource instance under your storage account's Networking settings. This is a major advantage of managed identity over service principal.
DOCUMENTATION REFERENCES
- Azure managed identities in Unity Catalog: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/azure-managed...
- Create a storage credential: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/storage-crede...
- Create an external location: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/external-loca...
- Connect to cloud storage using Unity Catalog: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/
Hope this helps get you connected! Let me know if you run into any issues with a specific step.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.