cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Sharing Issue between AWS and Azure

el_mark
New Contributor

Hi

We have attempted to setup a delta share between from Azure to AWS.

We can see the delta share table and meta data in AWS, however when we attempt to query the table we hit a problem.

If we use serverless SQL or Notebook and whitelist the IP address from the databricks serverless cluster the query returns expected results.

However, if we attempt to use dedicated IP address from AWS VPC and a non-serverless SQL warehouse and dedicated Compute instances we get the following error:

“HTTP 500 INTERNAL_ERROR

Reason: DS_INTERNAL_ERROR_FROM_DB_DS_SERVER

Endpoint: https://data-sharing.****.internal.azuredatabricks.net:443/api/2.0/delta-sharing/metastores/<metasto...

Trace ID: f92422c0b36284fe3d03aca010de9953”

Any ideas what is stopping the dedicated IP from returning results from the same share?

Thanks in advance for any insights.

Mark

1 ACCEPTED SOLUTION

Accepted Solutions

ManojkMohan
Honored Contributor II

@el_mark 

if your notebook on the dedicated compute shows the expected public IP via ipify, but queries still fail while serverless works, it strongly suggests the Azure Storage firewall (or network rules on the storage account) is only allowing the serverless IPs and not the egress IPs from your AWS VPC.

Check

  1. The egress IP/CIDR of your AWS VPC (non‑serverless warehouse / compute) is added to the Azure Storage Account firewall allow list.
  2. Any Delta Sharing IP access list on the provider side also includes this IP/CIDR https://community.databricks.com/t5/data-engineering/delta-sharing-open-issue-with-access-data-on-st...
  3. DNS/routing from that VPC can reach the Azure storage endpoint 

 

View solution in original post

3 REPLIES 3

ManojkMohan
Honored Contributor II

@el_mark 

Root Cause
Serverless SQL or notebook queries that whitelist the Databricks serverless cluster IP succeed because those IP addresses are allowed access through the Azure storage account firewall

Dedicated compute instances or non-serverless SQL warehouses on AWS typically use different IP addresses (e.g., from AWS VPC egress) that must be explicitly allowed on the Azure storage account firewall.


Solution:

Storage Firewall Rules

Verify the Azure Storage Account firewall includes the egress IP addresses used by your dedicated AWS VPC and non-serverless compute instances.

Network Connectivity Configuration

If your dedicated compute uses private IPs or VPC security groups, ensure proper routing and DNS resolution to access Azure storage endpoint URLs internally

Delta Sharing IP Access List

If using Delta Sharing IP access lists on the provider side, add the dedicated IP addresses to those lists to ensure data access.

Cross-Cloud Access Best Practices

Check that the IPs or CIDR blocks your dedicated AWS VPC compute uses are explicitly allowed in Azure Storage firewalls.

 

Azure Databricks Delta Sharing troubleshooting guide: https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/troubleshooting


IP restrictions and access controls for Delta Sharing: https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/access-list

el_mark
New Contributor

Thank you @ManojkMohan.

I can see the correct IP address when if IPIFY from a compute notebook.  So from what you are saying above, that implies the issue is with the Azure Storage firewall right?

ManojkMohan
Honored Contributor II

@el_mark 

if your notebook on the dedicated compute shows the expected public IP via ipify, but queries still fail while serverless works, it strongly suggests the Azure Storage firewall (or network rules on the storage account) is only allowing the serverless IPs and not the egress IPs from your AWS VPC.

Check

  1. The egress IP/CIDR of your AWS VPC (non‑serverless warehouse / compute) is added to the Azure Storage Account firewall allow list.
  2. Any Delta Sharing IP access list on the provider side also includes this IP/CIDR https://community.databricks.com/t5/data-engineering/delta-sharing-open-issue-with-access-data-on-st...
  3. DNS/routing from that VPC can reach the Azure storage endpoint