โ07-09-2025 02:05 PM
I have a job with one task which is to run a notebook. The job run is setup with a File arrival trigger with my blob storage as the location. The trigger works and the job will start when a new file arrives in the location, but it does not run for multiple files.
For example I had three files uploaded at different times. First at 3:57:03, second at 3:57:07, and the last at 3:57:10. Three new files arrived, but only one job was started. Why did three jobs not get queued to run?
โ07-09-2025 02:23 PM
Did you overwrite the file with the same name because overwriting an existing file with a file of the same name does not trigger a run.
โ07-09-2025 06:46 PM
No each file had a unique name associated with them.
โ07-09-2025 02:25 PM
Check if you have configured this two options.
โ07-09-2025 06:51 PM
They are both set to 00h 00m.
โ07-09-2025 02:30 PM
Hi @Sneeze7432 ,
I think it could be caused by following option Wait after last change in seconds. According to documentation:
"The time to wait to trigger a run after file arrival. Another file arrival in this period resets the timer. This settings can be used when files arrive in batches, and the whole batch needs to be processed after all files have arrived."
An important thing to keep in mind is that "another file arrival in this period resets the timer". Put differently, if you've continuously arriving files, your Workflow will never start as its execution will be continuously delayed. For that reason this setting should be used only to optimize the batch of processed files.
โ07-09-2025 06:52 PM
I have the "Wait after last change" setting set to 00h 00m which I would assume means that immediately after a file drops in the storage location the job run will start. I would also assume that means if I drop multiple files in the same location multiple jobs should start, and based on my concurrency limits some may have to be queued.
โ07-10-2025 12:05 AM
I'm just guessing, because unfortunately we don't have insight into how this was implemented, but it seems to me that the Databricks engineers treat files uploaded within a short time interval as a single batch โ most likely for optimization purposes. If a trigger were to be generated every second, it wouldnโt be a very efficient approach.
Even that option is specified in minutes (as if they assume that anything below that would still be treated as a single batch).
โ07-10-2025 11:37 AM
What doesn't make sense is that the notification bar will tell me "3 new files" but only one job runs. So even though they can display the number of new files between checks it will still only do one job?
I don't know, it doesn't seem to be setup very well.
โ07-11-2025 12:05 AM
Maybe some databricks employee will jump in and will shed some light about implementation details. But for me treating really short intervals as one batch is quite reasonable approach to avoid massive amount of triggers.
โ07-11-2025 05:04 PM
Same, I would really appreciate more details around this.
โ07-10-2025 12:51 AM
It looks like the trigger process files in batches, which means that each of the files uploaded doesn't create a new instance of a job.
If you need to process files immediately or separately, you can play with Auto Loader configuration.
โ07-10-2025 07:47 AM
@Sneeze7432 you can also try editing the max concurrent runs in the workflow.
โ07-11-2025 05:01 PM
That doesn't solve the problem of jobs not queueing. That would actually not be good because I could have multiple jobs writing to the same location and potentially overwriting each other creating inaccurate data.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now