cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SFTP Autoloader

erigaud
Honored Contributor

Hello, 

Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ? 

Thank you !

1 ACCEPTED SOLUTION

Accepted Solutions

Hey, 

There is two possible way. 

First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp 
You connect to the SFTP and load the data into your storage. 

The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline. 

Cheers 🙂 

 

View solution in original post

5 REPLIES 5

BriceBuso
Contributor II

Hello Etienne, 

I think it's not possible.The autoloader is used to process new files from a cloud storage as said in the documentation: 

BriceBuso_0-1689604622879.png

Moreover, to respect the Medallion cloud Architecture it's better to first ingest the files on the DBFS and after use the Autoloader on the specified location. 

 

erigaud
Honored Contributor

Thank you @BriceBuso !

So what is the recommended best practice to ingest data from SFTP data sources ? 

Hey, 

There is two possible way. 

First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp 
You connect to the SFTP and load the data into your storage. 

The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline. 

Cheers 🙂 

 

Anonymous
Not applicable

Hi @erigaud 

We haven't heard from you since the last response from​, @BriceBuso  and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

erigaud
Honored Contributor

His suggestion helped, I accepted it as an answer, thank you !

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!