cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

springml sftp with spark 3.x

Unimog
New Contributor III

Is there a version of springml spark-sftp that works with spark 3.x and scala 2.12?  If so can you point me to it or how to load it in my compute?

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

The SpringML Spark-SFTP library does not natively support Apache Spark 3.x and Scala 2.12. The library has not been actively maintained, with the last documented commit made in April 2019. This outdated state results in several issues:

  1. Lack of Support for Spark 3.x: It is acknowledged that the SpringML Spark-SFTP library is incompatible with Spark 3.x

  2. Incorrect Implementation: The library is designed as a DataSource rather than as a Hadoop FileSystem, which affects the support for common formats like CSV, Parquet, and JSON in a Spark platform

  3. Unsupported File System Schemes: The library only supports the hdfs:// file system scheme, making it incompatible with the Databricks runtime

  4. No Further Maintenance: The lack of recent updates from the maintainers makes this library an unreliable dependency for modern Spark and Scala environments

Hope this helps, Louis.

View solution in original post

3 REPLIES 3

Louis_Frolio
Databricks Employee
Databricks Employee

The SpringML Spark-SFTP library does not natively support Apache Spark 3.x and Scala 2.12. The library has not been actively maintained, with the last documented commit made in April 2019. This outdated state results in several issues:

  1. Lack of Support for Spark 3.x: It is acknowledged that the SpringML Spark-SFTP library is incompatible with Spark 3.x

  2. Incorrect Implementation: The library is designed as a DataSource rather than as a Hadoop FileSystem, which affects the support for common formats like CSV, Parquet, and JSON in a Spark platform

  3. Unsupported File System Schemes: The library only supports the hdfs:// file system scheme, making it incompatible with the Databricks runtime

  4. No Further Maintenance: The lack of recent updates from the maintainers makes this library an unreliable dependency for modern Spark and Scala environments

Hope this helps, Louis.

Unimog
New Contributor III

Thanks!  Any alternatives you recommend for sftp within databricks?

Louis_Frolio
Databricks Employee
Databricks Employee

For Python you might want to look at Paramiko, it seems that it might be an option.  You could also look at ETL tools like Airbyte, Rivery, CData, etc.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now