cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

External Locations to Azure Storage via Private Endpoint

Marthinus
New Contributor III

When working with Azure Databricks (with VNET injection) to connect securely to an Azure Storage account via private endpoint, there's a few locations it needs to connect to, firstly the vnet that databricks is connected to, which works well when connecting with blob client in a notebook.
Then connecting to serverless compute following this guide: Configure private connectivity from serverless compute - Azure Databricks | Microsoft Learn, which doesn't seem to work, as then performing the same operation errors with `This request is not authorized to perform this operation.`

But most importantly, none of this touches on the control plane, which Unity Catalog uses for external locations, and I can't seem to find documentation on how to create a private endpoint to the control plane at all?

Is there any guidance on how to create an external location using private endpoint Azure Storage account? Trying to create it ends with error: `Failed to access cloud storage: [AbfsRestOperationException] () exceptionTraceId=XXX`, and now other details.

2 ACCEPTED SOLUTIONS

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee
Here are some things to consider:
 
To securely connect Azure Databricks to an Azure Storage account via a private endpoint using Unity Catalog, here are key considerations and steps aligning with the documentation:
 
Setting up External Locations with Unity Catalog 1. Use Managed Identities: - Unity Catalog supports storage credentials using Azure managed identities, which eliminate the need for secret rotation and can access storage accounts protected by network rules. Configure the managed identity to have "Storage Blob Data Contributor" or "Storage Blob Delegator" roles on the Azure Storage account.
  1. Create Storage Credentials:
    • Create a storage credential linked to the managed identity. This serves as the authentication mechanism to access the Azure Data Lake Storage Gen2 account.
  2. Define External Locations:
    • Link the storage credential to an external location that specifies the path within the storage account. Access to these external locations can be controlled using dedicated ACLs within Unity Catalog.
  3. Networking Configuration:
    • Ensure private endpoints are established for the storage account to allow access from the Databricks workspace. The private endpoints should cover appropriate sub-resources, such as dfs and blob, required for operations.
  4. Verify Network Rules:
    • If public network access is disabled for the storage account, ensure that the managed identity is added to the allowed list of network rules. Alternatively, enable private link connectivity.
Addressing Access Issues (e.g., AbfsRestOperationException) To resolve errors such as "[AbfsRestOperationException] Operation failed: 'This request is not authorized to perform this operation,'" ensure the following: - Validate the storage path against the appropriate external location registered in Unity Catalog, confirming the storage credential has sufficient permissions. - Test connectivity to the storage account from Databricks using tools like curl or nslookup to confirm private endpoints and network configurations are operational.
 
Private Endpoint Configuration for Unity Catalog Control Plane The control plane in Unity Catalog can utilize back-end private link connections for secure operations when deployed in VNets with private endpoint support: - Each Azure Databricks workspace's control plane can connect privately to core services through back-end private link connections. This guards sensitive control plane traffic against public exposure, enhancing security for governance and metadata management.
 
Best Practices for External Locations - Avoid mounting storage accounts to DBFS directly to ensure Unity Catalog ACLs are enforced. External locations should be used solely with ACLs defined at the Unity Catalog level.
These steps and best practices are essential to ensure secure and efficient connectivity between Azure Databricks and Azure Storage accounts via Unity Catalog. For troubleshooting and specific configurations, refer to networking guides and Unity Catalog troubleshooting documentation.
 
Cheers, Louis.

View solution in original post

Marthinus
New Contributor III

I've read that in the documentation, and when I now tried with an Access Connector for Azure Databricks instead of my own service principal, it seems to have worked, shockingly, even if I completely block network access on the storage account with zero private endpoints. No idea how, but for anyone coming across this, the solution is to use an Access Connector for Azure Databricks.

View solution in original post

2 REPLIES 2

BigRoux
Databricks Employee
Databricks Employee
Here are some things to consider:
 
To securely connect Azure Databricks to an Azure Storage account via a private endpoint using Unity Catalog, here are key considerations and steps aligning with the documentation:
 
Setting up External Locations with Unity Catalog 1. Use Managed Identities: - Unity Catalog supports storage credentials using Azure managed identities, which eliminate the need for secret rotation and can access storage accounts protected by network rules. Configure the managed identity to have "Storage Blob Data Contributor" or "Storage Blob Delegator" roles on the Azure Storage account.
  1. Create Storage Credentials:
    • Create a storage credential linked to the managed identity. This serves as the authentication mechanism to access the Azure Data Lake Storage Gen2 account.
  2. Define External Locations:
    • Link the storage credential to an external location that specifies the path within the storage account. Access to these external locations can be controlled using dedicated ACLs within Unity Catalog.
  3. Networking Configuration:
    • Ensure private endpoints are established for the storage account to allow access from the Databricks workspace. The private endpoints should cover appropriate sub-resources, such as dfs and blob, required for operations.
  4. Verify Network Rules:
    • If public network access is disabled for the storage account, ensure that the managed identity is added to the allowed list of network rules. Alternatively, enable private link connectivity.
Addressing Access Issues (e.g., AbfsRestOperationException) To resolve errors such as "[AbfsRestOperationException] Operation failed: 'This request is not authorized to perform this operation,'" ensure the following: - Validate the storage path against the appropriate external location registered in Unity Catalog, confirming the storage credential has sufficient permissions. - Test connectivity to the storage account from Databricks using tools like curl or nslookup to confirm private endpoints and network configurations are operational.
 
Private Endpoint Configuration for Unity Catalog Control Plane The control plane in Unity Catalog can utilize back-end private link connections for secure operations when deployed in VNets with private endpoint support: - Each Azure Databricks workspace's control plane can connect privately to core services through back-end private link connections. This guards sensitive control plane traffic against public exposure, enhancing security for governance and metadata management.
 
Best Practices for External Locations - Avoid mounting storage accounts to DBFS directly to ensure Unity Catalog ACLs are enforced. External locations should be used solely with ACLs defined at the Unity Catalog level.
These steps and best practices are essential to ensure secure and efficient connectivity between Azure Databricks and Azure Storage accounts via Unity Catalog. For troubleshooting and specific configurations, refer to networking guides and Unity Catalog troubleshooting documentation.
 
Cheers, Louis.

Marthinus
New Contributor III

I've read that in the documentation, and when I now tried with an Access Connector for Azure Databricks instead of my own service principal, it seems to have worked, shockingly, even if I completely block network access on the storage account with zero private endpoints. No idea how, but for anyone coming across this, the solution is to use an Access Connector for Azure Databricks.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now