cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unreliable file events on Azure Storage (SFTP) for job trigger

Dimitry
Contributor III

Hi all

I got a job trigger by a file event on the external location.

The location and jobs triggers are working fine when uploading file via Azure Portal.

I need SFTP trigger, so I went into the event grid, found subscription for the storage account on question and modified filters by adding SftpCommit and SftpRename (sometimes 3rd party uploads .part file and renames).

 

I'm uploading a sample file manually to SFTP and I see the event trigger in the event grid that is displayed as matched event.

Next second I see a message in the Azure Storage account queue (queue is automatically created by Bricks for this external location). Then the message disappears.

Job... nothing.

 

2nd part of the riddle.

Files arrive ONLY by SFTP.

Job does trigger. But it triggers completely out of sync with the events.

I check events in the storage log, local time zone:

Dimitry_2-1756857231122.png

And here are the job runs:

Dimitry_1-1756857151591.png

They are by hours out of sync with the queue.

The job is configured to be queued with 1 concurrent run. A cluster is dedicated to the job.

Anyone seen this before? What is happening?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

FYI - look at this Solved: Job File Event Trigger not firing for SftpCommit a... - Databricks Community - 128356

You also copied the same solution. But the trick is - you DONT use events. Just turn it off. No queue.

In this case File Event trigger will be polling files opposite to messages. And it will run regardless how file arrived there. That's the solution!

View solution in original post

11 REPLIES 11

Dimitry
Contributor III

Update

Appears that even uploading via UI does not trigger it any more. It did trigger weeks ago.

I have just uploaded a file in UI and saw this message in the storage queue:

{"topic":"/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Storage/storageAccounts/xxx","subject":"/blobServices/default/containers/kumho/blobs/stock/test.csv","eventType":"Microsoft.Storage.BlobCreated","id":"d85d530f-401e-0043-7d65-1ca903060b80","data":{"api":"PutBlob","requestId":"d85d530f-401e-0043-7d65-1ca903000000","eTag":"0x8DDEA7C98CE433C","contentType":"text/csv","contentLength":34300,"blobType":"BlockBlob","accessTier":"Default","blobUrl":"https://xxx.blob.core.windows.net/kumho/stock/test.csv","url":"https://xxx.blob.core.windows.net/kumho/stock/test.csv","sequencer":"00000000000000000000000000036fc90000000000156c90","identity":"$superuser","storageDiagnostics":{"batchId":"1616f433-a006-005b-0065-1c7664000000"}},"dataVersion":"","metadataVersion":"1","eventTime":"2025-09-02T23:58:21.6954353Z"}

The job didn't trigger at all.

The UI issue appeared to be driven by permissions. The Job "Run By" user should have access to the external location. Once given it starts to trigger.

SFTP does not trigger regardless. Only HTTP protocol does.

SFTP events coming through and create a message.

Databricks looks on the message and discards it. It seems to be looking into very specific events, so not every message in the storage account queue is respected.

Can someone advise on that if you happen to sort SFTP trigger

szymon_dybczak
Esteemed Contributor III

Hi @Dimitry ,

I was investigating similar case in other thread and in my case it worked as expected. Here's a reply I posted back then, try to follow all steps from scratch:

Solved: Job File Event Trigger not firing for SftpCommit a... - Databricks Community - 128356

"Ok, I've recreated your scenario (more or less). So I enabled SFTP on my storage account and created home directory for my SFTP user:

szymon_dybczak_0-1756878267621.png

 


Then in databricks I've enabled file events for an external location (which is recommended). To enable it, you need to make sure that your Unity Catalog Access Connector has appropriate permissions. So check if your managed identity has following roles:

  • Storage Account Contributor
  • Storage Blob Data Contributor
  • EventGrid EventSubscription Contributor
  • Storage Queue Data Contributor: Required only if you want Azure Databricks to create a subscription and events in Azure Data Lake Storage for you. If you do not enable this role, you must create the Azure storage queue yourself).

Manage external locations - Azure Databricks | Microsoft Learn

Next, you need to enable file events for your external location. Go to Unity Catalog and click external locations.

szymon_dybczak_1-1756878267539.png

 

Select the one for which you want to enable file event.
Now, once you are inside external location - click edit button:

szymon_dybczak_2-1756878267480.png

 

 

Then tick Enable file events and click Auto-fill access connector ID

szymon_dybczak_3-1756878267896.png

 

Now you can configure you job with file arrival trigger. If everything went smoothly you should see event grid system topic and event subscription created for you by Databricks in your storage account

szymon_dybczak_4-1756878267771.png

 

 I've tested it by uploading some file using WinSCP and file arrival trigger worked like a charm 🙂 "

 

 

I have a bunch of external locations created by bricks out of the box. So this is enabling file events etc., all checkboxes ticked. None of them trigger when I upload a file via SFTP.

Can you query logs on your storage account and see what protocol and what events WinCSP generated?

In my case events are SftpCommit and SftpRename.

Dimitry_0-1756882249781.png

 

Note that when you create "enable file events" the automatic setup in the Event Grid looks following:

Dimitry_1-1756882345917.png

None of these are generated by SFTP when just uploading a file (not sure about deletion etc.)

In this case the storage account queue for the external location does not receive a message.

I added SftpCommit to the list of the filters in the event grid. Message arrives. Databricks processes it and ...nothing. 

 I'm interested to know how it was triggered in your case and what protocol

 

szymon_dybczak
Esteemed Contributor III

Hi @Dimitry ,

Unfortunately, I don't have right now access to my company's visual studio subscription so I can't recreate it. But maybe you've found a bug? If you set up SftpCommit and SftpCreate events in system topics advance filters and event grid properly forwards events to storage queue and then if those events are consumed by file arrival trigger but job not started then for me it looks like a bug.

Perhaps, but would be great if someone could confirm from Databricks side of things.

To me it looks like bricks subscribes to a wider list of events on Azure level, but internally has its own filters to only care about some, and Sftp* is not in the allow list.

Look at this response from Azure Support on my question Storage account file event is not triggered for Azure Databricks on SFTP upload - Microsoft Q&A

------------------------------

Thanks for your details. I can confirm that the behavior you are seeing is expected today.

  • Databricks file arrival triggers currently only support standard blob events like BlobCreated/BlobDeleted.
  • SFTP-specific events such as SftpCommit or SftpCreate are raised correctly by Azure Storage and delivered via Event Grid, but Databricks job triggers do not consume them.
  • This is a Databricks product limitation, not an Azure Storage/Event Grid configuration issue.

References:

Workarounds:

  • If HTTPS/Blob uploads are possible, they will trigger jobs as expected.
  • If SFTP ingestion is required, you can configure an Event Grid subscription on SftpCommit and use an Azure Function / Logic App / REST API call to trigger the Databricks job.

We have flagged this to the Databricks product team since support for SFTP-based triggers would need to come from their side.

Thanks again for raising this - your findings will help others facing the same scenario.

I hope this information helps. Please do let us know if you have any further queries.

szymon_dybczak
Esteemed Contributor III

Hi @Dimitry ,

Thank you very much for sharing this with us. It would be nice to add this feature to Databricks. Could you mark your answer as solution? This will help others with similiar issue find the answer faster

FYI - look at this Solved: Job File Event Trigger not firing for SftpCommit a... - Databricks Community - 128356

You also copied the same solution. But the trick is - you DONT use events. Just turn it off. No queue.

In this case File Event trigger will be polling files opposite to messages. And it will run regardless how file arrived there. That's the solution!

szymon_dybczak
Esteemed Contributor III

Great one! But still it would be nice to have support for SFTP events because using polling you're limited for 10k files. But still, I think this is a great workaroudn for most of us 🙂

Yeah, but in my case I just archive the file as soon as it gets processed, and it will be handful tops. I guess majority of cases are like that and when you need more in one place... perhaps you have a budget to incorporate ADF/Function/Logic app. I have like a dozen of containers, one per supplier, so for me external solution to trigger and map these into the same number of jobs was looming to be spending the next weekend aside from the family. 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now