<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Trying to connect SFTP directly in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/trying-to-connect-sftp-directly-in-databricks/m-p/114333#M44787</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;BR /&gt;As a proof of concept, I have created an ADLS and enabled SFTP&amp;gt; Created Local User and SSH Private key.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now, I am trying to connect this SFTP connection directly in Databricks to create table or data frame.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Below is the code snipper I have used and the maven library -&lt;/P&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;com.springml : spark-sftp_2.11:1.1.5&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;code 1:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_0-1743625413680.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15782i86B3770E4B42409C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_0-1743625413680.png" alt="Shetty_1338_0-1743625413680.png" /&gt;&lt;/span&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;code 2:&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_1-1743625438294.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15783i1C2834A66E183E7E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_1-1743625438294.png" alt="Shetty_1338_1-1743625438294.png" /&gt;&lt;/span&gt;&lt;P&gt;I am getting below error for both the syntax. I am not sure where the code or values needs to be changed.&lt;/P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_2-1743625482972.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15784i5BD4E0F48F4800CD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_2-1743625482972.png" alt="Shetty_1338_2-1743625482972.png" /&gt;&lt;/span&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;Can anyone help me resolve this?&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 02 Apr 2025 20:25:42 GMT</pubDate>
    <dc:creator>Shetty_1338</dc:creator>
    <dc:date>2025-04-02T20:25:42Z</dc:date>
    <item>
      <title>Trying to connect SFTP directly in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-connect-sftp-directly-in-databricks/m-p/114333#M44787</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;BR /&gt;As a proof of concept, I have created an ADLS and enabled SFTP&amp;gt; Created Local User and SSH Private key.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now, I am trying to connect this SFTP connection directly in Databricks to create table or data frame.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Below is the code snipper I have used and the maven library -&lt;/P&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;com.springml : spark-sftp_2.11:1.1.5&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;code 1:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_0-1743625413680.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15782i86B3770E4B42409C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_0-1743625413680.png" alt="Shetty_1338_0-1743625413680.png" /&gt;&lt;/span&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;code 2:&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_1-1743625438294.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15783i1C2834A66E183E7E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_1-1743625438294.png" alt="Shetty_1338_1-1743625438294.png" /&gt;&lt;/span&gt;&lt;P&gt;I am getting below error for both the syntax. I am not sure where the code or values needs to be changed.&lt;/P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Shetty_1338_2-1743625482972.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15784i5BD4E0F48F4800CD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Shetty_1338_2-1743625482972.png" alt="Shetty_1338_2-1743625482972.png" /&gt;&lt;/span&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;Can anyone help me resolve this?&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 02 Apr 2025 20:25:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-connect-sftp-directly-in-databricks/m-p/114333#M44787</guid>
      <dc:creator>Shetty_1338</dc:creator>
      <dc:date>2025-04-02T20:25:42Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to connect SFTP directly in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/trying-to-connect-sftp-directly-in-databricks/m-p/114749#M44931</link>
      <description>&lt;P&gt;```scala&lt;BR /&gt;// For Spark 3.x with Scala 2.12 (common in newer Databricks runtimes)&lt;BR /&gt;com.springml:spark-sftp_2.12:1.1.5&lt;BR /&gt;```&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;```scala&lt;BR /&gt;com.springml:spark-sftp_2.11:1.1.5&lt;BR /&gt;```&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;```scala&lt;BR /&gt;val df = spark.read&lt;BR /&gt;.format("com.springml.spark.sftp")&lt;BR /&gt;.option("host", "your-sftp-host.com")&lt;BR /&gt;.option("port", "22") // Default SFTP port&lt;BR /&gt;.option("username", "your-username")&lt;BR /&gt;.option("pem", "/path/to/your/private/key") // Or use password option&lt;BR /&gt;.option("fileType", "csv") // Specify your file type&lt;BR /&gt;.option("delimiter", ",") // For CSV files&lt;BR /&gt;.option("header", "true") // If your file has headers&lt;BR /&gt;.load("/remote/path/to/your/file.csv")&lt;BR /&gt;```&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;```python&lt;BR /&gt;%sh&lt;BR /&gt;# Check DNS resolution&lt;BR /&gt;dig +short your-sftp-host.com&lt;/P&gt;
&lt;P&gt;# Check port connectivity&lt;BR /&gt;nc -vz your-sftp-host.com 22&lt;BR /&gt;```&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;```python&lt;BR /&gt;%pip install paramiko&lt;/P&gt;
&lt;P&gt;import paramiko&lt;BR /&gt;import os&lt;/P&gt;
&lt;P&gt;# Set up connection parameters&lt;BR /&gt;hostname = "your-sftp-host.com"&lt;BR /&gt;username = "your-username"&lt;BR /&gt;key_path = "/dbfs/path/to/your/private/key" # Or use password authentication&lt;/P&gt;
&lt;P&gt;# Create SFTP client&lt;BR /&gt;ssh_client = paramiko.SSHClient()&lt;BR /&gt;ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())&lt;/P&gt;
&lt;P&gt;# Connect with private key&lt;BR /&gt;key = paramiko.RSAKey.from_private_key_file(key_path)&lt;BR /&gt;ssh_client.connect(hostname=hostname, username=username, pkey=key)&lt;/P&gt;
&lt;P&gt;# Create SFTP session&lt;BR /&gt;sftp_client = ssh_client.open_sftp()&lt;/P&gt;
&lt;P&gt;# Download file to local storage&lt;BR /&gt;local_path = "/tmp/downloaded_file.csv"&lt;BR /&gt;remote_path = "/remote/path/to/file.csv"&lt;BR /&gt;sftp_client.get(remote_path, local_path)&lt;/P&gt;
&lt;P&gt;# Close connection&lt;BR /&gt;sftp_client.close()&lt;BR /&gt;ssh_client.close()&lt;/P&gt;
&lt;P&gt;# Read the downloaded file into a DataFrame&lt;BR /&gt;df = spark.read.csv(local_path, header=True)&lt;BR /&gt;```&lt;BR /&gt;Based on the error message and your setup, there are several potential issues with your SFTP connection in Databricks. Let's address them:&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Connection Issues Analysis&lt;BR /&gt;The error message indicates a connection problem to your SFTP server. This could be due to:&lt;BR /&gt;1. Network connectivity issues between Databricks and your SFTP server&lt;BR /&gt;2. Incorrect credentials or configuration parameters&lt;BR /&gt;3. Library compatibility problems with your Databricks runtime&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Solutions to Try&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1. Check Library Compatibility&lt;BR /&gt;Ensure you're using a compatible version of the spark-sftp library for your Databricks runtime:&lt;/P&gt;
&lt;P&gt;If you're using an older Databricks runtime with Spark 2.x:&lt;BR /&gt;2. Fix Connection Parameters&lt;BR /&gt;Try this revised code (see above) with explicit parameters:&lt;BR /&gt;3. Network Configuration&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Make sure Databricks can reach your SFTP server:&lt;BR /&gt;1. Check if your SFTP server allows connections from Databricks IP ranges&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;2. Verify the port (typically 22) is open for SFTP connections&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;3. Run the diagnostic commands (see above) in a Databricks notebook to check connectivity:&lt;/P&gt;
&lt;P&gt;4. Alternative Approach Using Paramiko&lt;BR /&gt;If the spark-sftp connector continues to cause issues, try using the Paramiko library: (see above)&lt;/P&gt;
&lt;P&gt;5. Check for Scala Version Mismatch&lt;/P&gt;
&lt;P&gt;The error `java.lang.NoClassDefFoundError: scala/Product$class` suggests a Scala version mismatch. Make sure the library version matches your Databricks runtime's Scala version (likely 2.12 for newer runtimes).&lt;/P&gt;
&lt;P&gt;Remember to whitelist the Databricks IP ranges on your SFTP server, not just your local machine's IP, as the actual connection will be made from Databricks compute nodes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Apr 2025 18:53:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/trying-to-connect-sftp-directly-in-databricks/m-p/114749#M44931</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-04-07T18:53:24Z</dc:date>
    </item>
  </channel>
</rss>

