โ07-17-2023 07:32 AM
Hello,
Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ?
Thank you !
โ07-17-2023 07:56 AM
Hey,
There is two possible way.
First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp
You connect to the SFTP and load the data into your storage.
The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline.
Cheers ๐
โ07-17-2023 07:38 AM
Hello Etienne,
I think it's not possible.The autoloader is used to process new files from a cloud storage as said in the documentation:
Moreover, to respect the Medallion cloud Architecture it's better to first ingest the files on the DBFS and after use the Autoloader on the specified location.
โ07-17-2023 07:40 AM
Thank you @BriceBuso !
So what is the recommended best practice to ingest data from SFTP data sources ?
โ07-17-2023 07:56 AM
Hey,
There is two possible way.
First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp
You connect to the SFTP and load the data into your storage.
The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline.
Cheers ๐
โ07-17-2023 08:51 PM
Hi @erigaud
We haven't heard from you since the last response fromโ, @BriceBuso and I was checking back to see if her suggestions helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
โ07-17-2023 11:26 PM
His suggestion helped, I accepted it as an answer, thank you !
a week ago
There is one more option, i.e. on Azure you can enable SFTP on blob storage - so it will act as SFTP server, and from other end, you can mount the storage on your Databricks as volume - than use for it - autoloader/file trigger etc.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group