08-11-2025 02:36 PM - edited 08-11-2025 02:48 PM
Given we have Unity Catalog and an Azure Databricks workspace with both in Azure west us region, and we want to allow serverless compute to access data in catalogs that use external locations on an Azure Storage account in west us 3, how can we get this to work?
Constraints include
It seems that we can't simply whitelist the west us Databricks serverless subnets on the west 3 Storage Account, like we can when everything is in a single region. The article Configure a firewall for serverless compute access has a section on "Cost implications of cross-region storage access" but I haven't found step-by-step instructions for enabling cross-region storage access.
If anyone has successfully implemented serverless cross-region storage access, I'd really appreciate some on info how you managed it!
Thanks,
Lexa
08-11-2025 08:11 PM - edited 08-11-2025 08:13 PM
Hi @lexa_koszegi ,
I guess the below excerpt from documentation is crucial.
"For cross-region traffic from Azure Databricks serverless compute (for example, workspace is in East US region and ADLS storage is in West Europe), Azure Databricks routes the traffic through an Azure NAT Gateway service."
For cross regional traffic serverless will use NAT Gateway - configuring NCC subnets in resource's firewall only applies to resources that reside within same region as NCC.
But since NAT Gateway provides stable egress IP address you can just whitelist that address in your storage account firewall and serverless should be able to connect.
08-12-2025 09:11 AM
Thanks, @szymon_dybczak!
I had gottten that far - creating an Azure NAT Gateway with a Public IP (using terraform) - but when terraform Apply tried to create an azurerm_subnet_nat_gateway_association to the the US West serverless subnets, the pipeline threw the following 401 Unauthorized error (tenant ID values removed for security reasons):
InvalidAuthenticationTokenTenant: The 'EvolvedSecurityTokenService' access token is from the wrong issuer 'https://sts.windows.net/(our Azure tenant ID)/'. It must match the tenant 'https://sts.windows.net/(Databricks' tenant ID)/' associated with this subscription. Please use the authority (URL) 'https://login.windows.net/(Databricks' tenant ID)' to get the token.
I suppose I could try using REST api instead of terraform so I can specify the auth token tenant, though I don't know that our service principal has permissions to get a token from Databricks' auth URL.
Although at this point I'm thinking to to just tell the team that requested this to either move their storage account to US West or use only classic compute. 😛
08-12-2025 09:31 AM - edited 08-12-2025 09:33 AM
Hi @lexa_koszegi ,
Sorry, probably I didn't explain it clearly enough in previous message. In cross region scenarios serverless compute will use NAT Gateway. But this gateway is already created and managed by databricks. You don't have to do anything with it.
Just write some python code that will make a request to a site that can return public IP and run that code using serverless compute (ie there is this site called what's my IP that will return you public IP address). This way you will know public IP address. It should be stable IP address of databricks managed nat Gateway. Then just add that address to storage firewall (the one which is in the different region)
08-12-2025 01:45 PM
Thanks, I was just now reading about the automatically created NAT gateway, but didn't know how to find it. I'll see about grabbing its IP via python script as you advise and see if that works for our use case!
08-13-2025 07:30 AM
Turns out we don't have a Databricks-managed NAT Gateway because our workspace is deployed in our own VNet and we have SCC enabled. I opened a track with Microsoft Support and will be working with them today; if we get it figured out I'll share the info here in case anyone runs into a similar problem.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now