cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Databricks to SFTP: Connection Fails Even with Whitelisted NAT Gateway IP

jeremy98
Honored Contributor

Hi community,

I’m experiencing a strange issue with my connection from Databricks to an SFTP server.

I provided them with an IP address created for Databricks via a NAT gateway, and that IP is whitelisted on their side. However, even though I have the correct credentials, I’m still having trouble connecting to the SFTP server.

Could you help me understand what might be causing this issue and what I should check or fix?

33 REPLIES 33

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

Could you provide us exact error that you get? 

jeremy98
Honored Contributor

Hi,

Thanks, so I'm trying to use paramiko to connect with some credentials given from one of our clients to send outbound data.

This is the error that I'm receiving: Authentication failed: transport shut down or saw EOF

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

Thanks for further details. So, let's start with following test. Run below code in databricks notebook:

import requests
requests.get("https://api.ipify.org").text

 The above command should return public IP address. That address should be the same as the one that was added to SFTP server whitelist.Could you check it?

jeremy98
Honored Contributor

Hi @szymon_dybczak ,

 

Thank you for your response.

We’ve set up a static IP using a NAT Gateway, which our compute resources within the virtual network are now using. I attempted to create an outbound rule in the Network Security Group to allow traffic from the virtual network (where the object is being sent) to the SFTP server. The destination is set to the IP address where I want to send the data. Is it correct my reasoning? Btw, it doesn't work šŸ˜ž

szymon_dybczak
Esteemed Contributor III

So your approach is correct. NAT gateway will provide stable egrees IP address and that address could be whitelisted in SFTP server. But remember to route outbound traffic from databricks subnets to that NAT Gateway using i.e User-Defined Routes.
When you wrote: "Btw, it doesn't work". You mean that following script didn't work?

import requests
requests.get("https://api.ipify.org").text


If above script returned public IP address of your NAT Gateway then I have another thing to check. I had weird issues in the past with paramiko. Could you check if you are able to connect with pysftp?

1. First install library

pip install pysftp

2. Try to connect

import pysftp

hostname = 'your_hostname'
username = 'your_username'
password = 'your_password'

cnopts = pysftp.CnOpts()
cnopts.hostkeys = None

try:
    with pysftp.Connection(host=hostname, username=username, password=password, cnopts=cnopts)  as sftp:
        print("SFTP connection successful")
except Exception as e:
    print("SFTP connection failed: ", str(e))

 

jeremy98
Honored Contributor

and yes, the IP that we set, is whitelisted from the customer side

jeremy98
Honored Contributor

Hi @szymon_dybczak,

Yes, I confirmed that I'm getting the same IP returned from the NAT Gateway. I also tried connecting using pysftp, but unfortunately, I still can't connect to the client's SFTP server.

Regarding the outbound rule—I believe I might need your guidance here. I added an outbound rule (priority 115) in the Network Security Group (NSG) to allow traffic on port 22 to the specified IP address within the virtual network where the NSG is attached. Could you confirm if that setup is correct?

Also, I think I may have missed this part you mentioned:

"But remember to route outbound traffic from Databricks subnets to that NAT Gateway using e.g. User-Defined Routes."

I'll look into implementing a User-Defined Route for outbound traffic from the Databricks subnets to the NAT Gateway. Please let me know if there's anything else I should verify.

szymon_dybczak
Esteemed Contributor III

Hi @jeremy98 ,

Your network setup should be correct. If you got expected IP address performing the above test that means that your Databricks subnets are using NAT Gateway for egress traffic correctly.
Ok, could you run in shell cell following command?

nc -zv your_sftp_address 22

Also, when you tried to connect using pysftp, what error did you get? The same one as with paramiko?

jeremy98
Honored Contributor

Do u mean this setup miss? @szymon_dybczak 

jeremy98_0-1753313617288.png

 

jeremy98
Honored Contributor

ping

jeremy98
Honored Contributor

Hi @szymon_dybczak ,

*I tried to use pysftp and we got the same error still.

But, we don't understand that if we run this command:


```import socket

def get_ssh_egress_ip(destination="xxxx"šŸ˜ž
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
# Connect to port 22, no data sent, just triggers OS to assign route
sock.connect((destination, 22))
local_ip = sock.getsockname()[0]
print(f"Detected egress IP used for SSH to {destination}: {local_ip}")
except Exception as e:
print(f"Error detecting egress IP: {e}")
finally:
sock.close()

get_ssh_egress_ip()```

we have the subnet IP and not the NAT Gateway IP. So, maybe the subnet is not forwarded to consider the NAT IP...

and running the bash command ... we got "xxxx ... 22 (ssh) open"

szymon_dybczak
Esteemed Contributor III

If your Databricks subnets are linked to NAT Gateway then all outbound traffic should be via this gateway. 
Do you have possibility to ask SFTP administrator for logs? What IP address they can see when you're trying to connect? Reply "xxxx ... 22 (ssh) open" indicates that from network perspective you were able to reach destination server and the port is open.

jeremy98
Honored Contributor

Hi @szymon_dybczak,

Yes, the subnets are connected to our new NAT gateway, but we’re still experiencing communication issues.

Unfortunately, we don’t have the option to request logs from the SFTP admin. The IP address we’re connecting to (which I haven’t shared here for security reasons) is correct — the port appears to be open.

However, I’m wondering why, when executing the previous code, the source IP is still the private IP. Shouldn’t it be the NAT gateway’s public IP instead?

szymon_dybczak
Esteemed Contributor III

" I’m wondering why, when executing the previous code, the source IP is still the private IP. Shouldn’t it be the NAT gateway’s public IP instead?"

Regarding this part, I think this piece of code sock.getsockname()[0]  will return the local IP address from subnet (before NAT happens) - so your traffic is leaving from a private IP and SNAT is expected to occur later, at the NAT gateway boundary.


Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now