cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks Unity Catalog - cannot access managed volume in notebook

AlbertWang
Contributor III

We have set up Azure Databricks with Unity Catalog (Metastore).

  • Used Managed Identity (Databricks Access Connector) for connection from workspace(s) to ADLS Gen2
  • ADLS Gen2 storage account has Storage Blob Data Contributor and Storage Queue Data Contributor at the storage account level granted to the Databricks Access Connector
  • Used Storage Credentials and External Location
  • I am a Databricks account admin and the metastore admin
  • Created catalog and mapped to a External Location (ASDL Gen2 container)
  • Created schema and mapped to a External Location (ASDL Gen2 container)
  • Created a volume
  • I can uploaded files to the volume on Databricks Portal
  • I can browse the files in the volume on Databricks Portal
  • Everything (workspaces, ADLS Gen2, etc) is in the same Azure region

When I run `

dbutils.fs.ls("/Volumes/catalog1/schema1/volume1")`, I got error "Operation failed: "This request is not authorized to perform this operation.", 403, GET".
 
Any help will be appreciated!
1 ACCEPTED SOLUTION

Accepted Solutions

AlbertWang
Contributor III

I found the reason and a solution, but I feel this is a bug. And I wonder what is the best practice.

When I enable the ADSL Gen2's Public network access from all networks as shown below, I can access the volume from a notebook.

1.png

However, if I enable the ADSL Gen2's Public network access from selected virtual networks and IP addresses as shown below, I cannot access the volume from a notebook. Even though I added the VM's public IP to the whitelist, added the resource Microsoft.Databricks/accessConnectors to the resource instances, and enabled the Exceptions Allow Azure services on the trusted services list to access this storage account. As I understand, my compute has the Unity Catalog badge, it should access the ADSL Gen2 via the Access Connector for Databricks (Managed Identity), so it should be able to access the ADSL Gen2 via the Access Connector for Databricks.

2.png3.png

View solution in original post

4 REPLIES 4

AlbertWang
Contributor III

I would like to provide more details about our architecture. I did the below steps.

  • I am the Azure Databricks 
  • Created a Azure Databricks Workspace (Premium Tier)
  • Created a Databricks Metastore, named metastore1
  • Created a Azure ADSL Gen2 (storage account with Hierarchical namespace enabled), named adslgen2a
  • Created a Azure Access Connector for Azure Databricks as a Azure Managed Identity, named ac_for_dbr
  • In the adslgen2a, assigned the roles Storage Blob Data Contributor and Storage Queue Data Contributor to the ac_for_dbr
  • Created two ADSL Gen2 Containers under adslgen2a. One named adslgen2_default_container, another one named adslgen2_schema1_container.
  • Created a Databricks Storage Credentials, named dbr_strg_cred, and the connector id is the resource id of ac_for_dbr. The Permissions of the Storage Credentials were not set (empty).
  • Created two Databricks External Locations, both use the dbr_strg_cred.
  • One external location named dbr_ext_loc_catalogdefault, and points to the ADSL Gen2 Container adslgen2_default_container. The Permissions of the External Location were not set (empty).
  • Another one named dbr_ext_loc_schema1, and points to the ADSL Gen2 Container adslgen2_schema1_container. The Permissions of the External Location were not set (empty).
  • Created a Databricks Catalog, named catalog1, under metastore1. Set dbr_ext_loc_catalogdefault as this catalog's Storage Location.
  • Created a Databricks Schema, named schema1, under catalog1. Set dbr_ext_loc_schema1 as this schema's Storage Location.
  • Created a Databricks Volume, named volumn11, under schema1.
  • On Databricks UI, I can upload/download files on volumn11.
  • However, when I created a All-purpose compute, and run dbutils.fs.ls("/Volumes/catalog1/schema1/volumn11"), I got error Operation failed: "This request is not authorized to perform this operation.", 403, GET
  • Details about the All-purpose compute
    • Type: Single node
    • Access mode: Single user
    • Single user access: myself
    • Runtime version: 14.3 LTS
    • Enable credential passthrough for user-level data access: disabled

AlbertWang
Contributor III

Hi @Retired_mod Could you kindly help with this issue? We are trying the Unity Catalog but it does not work as expected.

AlbertWang
Contributor III

Just to check if anybody has any idea.

AlbertWang
Contributor III

I found the reason and a solution, but I feel this is a bug. And I wonder what is the best practice.

When I enable the ADSL Gen2's Public network access from all networks as shown below, I can access the volume from a notebook.

1.png

However, if I enable the ADSL Gen2's Public network access from selected virtual networks and IP addresses as shown below, I cannot access the volume from a notebook. Even though I added the VM's public IP to the whitelist, added the resource Microsoft.Databricks/accessConnectors to the resource instances, and enabled the Exceptions Allow Azure services on the trusted services list to access this storage account. As I understand, my compute has the Unity Catalog badge, it should access the ADSL Gen2 via the Access Connector for Databricks (Managed Identity), so it should be able to access the ADSL Gen2 via the Access Connector for Databricks.

2.png3.png

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group