cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks to SFTP: Connection Fails Even with Whitelisted NAT Gateway IP

jeremy98
Honored Contributor

Hi community,

Iโ€™m experiencing a strange issue with my connection from Databricks to an SFTP server.

I provided them with an IP address created for Databricks via a NAT gateway, and that IP is whitelisted on their side. However, even though I have the correct credentials, Iโ€™m still having trouble connecting to the SFTP server.

Could you help me understand what might be causing this issue and what I should check or fix?

33 REPLIES 33

jeremy98
Honored Contributor

Hi @szymon_dybczak,

But, do we need to set an outbound rule in the network security group of databricks?

szymon_dybczak
Esteemed Contributor III

Yes, you should have an outbound rule that will allow outbound traffic from databricks subnets to SFTP destination on propert port

Hi syz,

Do u mean that the source address needs to be the NAT Gateway IP or the databricks subnet? Indeed, the destination IP of the client?

There could be also the needed to have an INBOUND RULE?

ps2: the sftp server is inside their Azure services

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

On your side you should have something like that:

Outbound NSG Rule (on your Databricks subnet NSG):

Field Value

DirectionOutbound
PrioritySet it according to your NSG rules (lower number means higher priority)
SourceVirtualNetwork or your Databricks subnet IP range
Source port*
DestinationIP of the SFTP server
Destination port22 
ProtocolTCP
ActionAllow
Namei.e Allow-SFTP-out

But to be honest, in this kind of troubleshooting both parties should be involved. Even simple verification on their side, like providing logs with information of IP address is connecting to SFTP could help diagnose the problem faster.

jeremy98
Honored Contributor

Hi @szymon_dybczak

So, the SFTP client needs to whitelist our subnet address? Instead of our NAT gateway IP?

Yep, we are going to ask them if they have some logs, but before they said no..

 

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

No, on the SFTP they need to whitelist NAT Gateway IP (which they alread did based on your previous messages). So on your side everything looks kind of correct. You have a NAT Gateway, your databricks subnets use that NAT Gateway for outbound traffic and you correctly setup NSG (and I assume you don't use Azure Firewall in your environment).
Moreover, the following test was successfull, so maybe this is not connectivity issue, but rather some kind of issue on SFTP side.

nc -zv your_sftp_address 22โ€‹

So, it looks like you are able to reach that server, but then SFTP server is terminating the session during or immediately after the auth handshake. Maybe server expects something specific? Maybe a proper SSH version,  or SSH banner? Hard to say without proper logs.

 

Kenji_3000
New Contributor III

Hi all,

Thanks again to @szymon_dybczak for the earlier help!

I'm working alongside Jeremy, and weโ€™ve been debugging outbound connectivity for Databricks traffic going through a NAT Gateway, specifically for connecting to an SFTP server also hosted in Azure.

Our current theory is that Azureโ€™s backbone network is overriding NAT Gateway routing, since both source (Databricks) and destination (SFTP) are on Azure.

We found this in the Azure docs on user-defined routes:

If the destination address is for an Azure service, Azure routes the traffic directly to the service over the Azure backbone network instead of routing the traffic to the internet.
Traffic between Azure services doesn't traverse the internet, regardless of region.
You can override the Azure default system route for 0.0.0.0/0 with a custom route.


โœ… What We've Confirmed

  1. NAT Gateway is configured and working

    • Public IP is returned correctly via:

      python
      CopyEdit
      requests.get("https://api.ipify.org", timeout=10).text
  2. Databricks subnets (both public and private)

    • Are explicitly associated with the NAT Gateway

    • Have "private subnet (no default outbound access)" setting enabled

  3. Port 22 is open

    • NSG rules allow outbound TCP 22

    • Authentication to the SFTP server succeeds, but fails shortly after login

  4. Egress IP Test Shows Internal Address
    When we run a basic egress test like this:

    python
    CopyEdit
    import socket def test_egress_ip(destination, port=22๐Ÿ˜ž sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) try: sock.connect((destination, port)) return sock.getsockname()[0] except Exception as e: print(f"Error: {e}") return None finally: sock.close()

    It returns a private IP, not the expected NAT Gateway public IP.

  5. Service endpoints are not enabled
    (We previously had Microsoft.Storage as a service endpoint, but removed it to avoid bypassing NAT.)


โ“ Questions

  • Are there specific subnet settings (beyond NSG and service endpoint removal) we need to ensure to force public egress?

  • Do we need a custom route table explicitly targeting 0.0.0.0/0 to Internet to guarantee traffic goes out via the NAT Gateway, even when the destination is another Azure-hosted service?

Any insights or experiences dealing with this kind of internal routing override would be super helpful. Thanks!

 

Here is a quick sketch how I think the flow is going

Kenji_3000_0-1753456347782.png

 

szymon_dybczak
Esteemed Contributor III

Hi @Kenji_3000 ,

Thanks for all the details. Your suspicion can be correct. I think the best way to check it is to create UDF that will force all outgoing traffic from databricks subnet to go to Internet (and hence use NAT Gateway).
Did you guys get some logs from SFTP admins? What kind of IP address they can see in logs when you're attempting to connect?

Hi, 

Thanks for your help again.. The client doesn't want to give us any logs, because they said that don't have the logs on their side..

jeremy98
Honored Contributor

up

jeremy98
Honored Contributor

@szymon_dybczak , I was watching that the client password has the '/' inside the password maybe this could be a potential error?

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

Could be, look at following thread. They used an ampersand in password and that give them a headache.

Ampersand in Password - RouterOS / Scripting - MikroTik community forum

In another thread someone had authentication issues and tried to use look_for_keys=False option. Could be worth trying:

ssh.connect(hostname=โ€œx.x.x.xโ€, port=xxxx, username=โ€œxโ€, password=โ€œxโ€, look_for_keys=False)

Also, another thing worth trying is to downgrade paramiko to version 2.8.1 or set disabled_algorithms={'keys': ['rsa-sha2-256', 'rsa-sha2-512']}:

SSH Authentication fails with 2.9.2 ยท Issue #1984 ยท paramiko/paramiko

python - Paramiko AuthenticationException issue - Stack Overflow

Hi syz, 

Doesn't change the issue, very strange ๐Ÿ˜ž

szymon_dybczak
Esteemed Contributor III

Did you try also downgrading paramiko to lower version? 

Kenji_3000
New Contributor III

Hi @szymon_dybczak ,

We confirmed that it is indeed the backbone network that is causing the issue as we fetched the logs of the sftp
Databricks --> SFTP (region outside Europe West) = Public IP NAT gateway
Databricks --> SFTP (region Europe West) = Private IP

Currently in contact with Microsoft support to overrule this backbone network. I tried to define a route table with 

address prefix 20.60.0.0/16 (all azure storage account space, also tested with 0.0.0.0/32)
next hop type: Internet

Unfortunately this also not works. Any idea maybe?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now