Databricks Community

erigaud · ‎07-17-2023

Hello,

Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ?

Thank you !

BriceBuso · ‎07-17-2023

Hey,

There is two possible way.

First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp
You connect to the SFTP and load the data into your storage.

The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline.

Cheers 🙂

View solution in original post

BriceBuso · ‎07-17-2023

Hello Etienne,

I think it's not possible.The autoloader is used to process new files from a cloud storage as said in the documentation:

Moreover, to respect the Medallion cloud Architecture it's better to first ingest the files on the DBFS and after use the Autoloader on the specified location.

erigaud · ‎07-17-2023

Thank you @BriceBuso !

So what is the recommended best practice to ingest data from SFTP data sources ?

BriceBuso · ‎07-17-2023

Hey,

There is two possible way.

First, is to use this spark library to connect to the SFTP: https://github.com/springml/spark-sftp
You connect to the SFTP and load the data into your storage.

The second one is to use Azure Data Factory (if you are on Azure, i don't know the equivalent in other Cloud Provider) to ingest the data into your blob mounted in your databricks. When your ADF pipeline is finished you trigger your Databricks pipeline.

Cheers 🙂

Anonymous · ‎07-17-2023

Hi @erigaud

We haven't heard from you since the last response from, @BriceBuso and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

erigaud · ‎07-17-2023

His suggestion helped, I accepted it as an answer, thank you !

PiotrU · ‎12-17-2024

There is one more option, i.e. on Azure you can enable SFTP on blob storage - so it will act as SFTP server, and from other end, you can mount the storage on your Databricks as volume - than use for it - autoloader/file trigger etc.

Dimitry · ‎09-02-2025

Just for others to be aware, databricks will ignore file triggers that contain SFTP events. It only respects HTTPS events in the storage account file arrival trigger. So you MUST use external solution.

Note: file event means Azure Event Grid drops a message into the storage queue, to which databricks external locaiton is subscrubed to. If queue message has "BlobPut" and similar predefined event types, that databricks respects, it will trigger the job. If not, it will not. So SFTP events are ignored.

Databricks Community

SFTP Autoloader

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐