cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SFTP Connection Timeout on Job Cluster but Works on Serverless Compute

Edoa
New Contributor

Hi all,

I'm experiencing inconsistent behavior when connecting to an SFTP server using Paramiko in Databricks.

  • When I run the code on Serverless Compute, the connection to xxx.yyy.com via SFTP works correctly.

  • When I run the same code on a Job Cluster, it fails with the following error:

SSHException: Unable to connect to xxx.yyy.com: [Errno 110] Connection timed out

Key snippet:

transport = paramiko.Transport((host, port))
transport.connect(username=username, password=password)

Is there any workaround or configuration needed to align the Job Cluster network permissions with those of Serverless Compute, especially to allow outbound SFTP (port 22) connections?

Thanks in advance for your help!

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @Edoa 

This is a common networking issue in Databricks related to the different network configurations between Serverless Compute and Job Clusters.
Here are the key differences and potential solutions:

Root Cause
Serverless Compute runs in Databricks' managed infrastructure with pre-configured network access,
while Job Clusters run in your workspace's VPC/network with potentially more restrictive networking rules.

Potential Solutions
1. Network Security Groups/Firewall Rules
Check your cloud provider's network security configuration:
For AWS:
- Ensure your VPC security groups allow outbound traffic on port 22
- Check NACLs (Network Access Control Lists) for outbound SSH rules

For Azure:
- Verify Network Security Groups allow outbound port 22
- Check if Azure Firewall is blocking the connection

For GCP:
- Review VPC firewall rules for outbound SSH access

2. Databricks Network Configuration

# Try specifying explicit timeout and connection parameters
import paramiko
import socket

def connect_with_retry(host, port, username, password, timeout=30):
try:
# Create socket with explicit timeout
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(timeout)

transport = paramiko.Transport(sock)
transport.connect((host, port))
transport.auth_password(username, password)

return transport
except Exception as e:
print(f"Connection failed: {e}")
raise

# Usage
transport = connect_with_retry(host, 22, username, password)


3. Alternative Connection Methods

# Try using SSHClient instead of Transport directly
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

try:
client.connect(
hostname=host,
port=port,
username=username,
password=password,
timeout=30,
banner_timeout=30,
auth_timeout=30
)

sftp = client.open_sftp()
# Your SFTP operations here

except Exception as e:
print(f"Connection error: {e}")
finally:
client.close()

4. VPC Endpoints/Private Connectivity
If your SFTP server is internal, consider:
- Setting up VPC endpoints for private connectivity
- Using Databricks Private Link if available
- Configuring custom DNS resolution

5. Cluster Configuration
Add these configurations to your job cluster:

# In cluster spark configuration
spark.conf.set("spark.databricks.cluster.profile", "serverless") # If available

Or use init scripts to configure networking:

#!/bin/bash
# Example init script for network troubleshooting
echo "nameserver 8.8.8.8" >> /etc/resolv.conf

6. Diagnostic Steps
Add this debugging code to understand the network situation:

import socket
import subprocess

def diagnose_connectivity(host, port=22):
# Test basic connectivity
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(10)
result = sock.connect_ex((host, port))
sock.close()

if result == 0:
print(f"✓ Port {port} is reachable on {host}")
else:
print(f"✗ Port {port} is not reachable on {host}")

except Exception as e:
print(f"Connection test failed: {e}")

# Test DNS resolution
try:
ip = socket.gethostbyname(host)
print(f"✓ DNS resolution: {host} -> {ip}")
except Exception as e:
print(f"✗ DNS resolution failed: {e}")

# Run diagnostics
diagnose_connectivity("xxx.yyy.com")

7. Workaround: Use Databricks Connect
If possible, consider using Databricks Connect to run the SFTP operations from your local environment or
a VM with proper network access.

Recommended Approach
First, run the diagnostic code to understand what's failing
Then, check your cloud provider's network security groups/firewall rules
Finally, try the alternative connection methods with explicit timeouts

The most likely solution is updating your VPC's outbound security
group rules to allow port 22 traffic to your SFTP server's IP range.

 

 

 

 

LR