- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 hours ago
Hi @maikel,
Not exactly. If you're using a Databricks file arrival trigger, it doesn't fire instantly when a file is uploaded. It makes a best-effort check roughly every minute, so it's better to think of it as near-real-time rather than immediate execution. In that setup, the usual pattern is to let the file arrival trigger start the job, and then use Auto Loader inside the job with trigger(availableNow=True) so it processes everything that has arrived since the last run and then exits cleanly.
If you need lower latency than that, then yes, you're generally moving away from a file-arrival-triggered batch pattern and into a long-running streaming workload. That said, I wouldn't position trigger(processingTime="30 seconds") as the only option, or even the default recommendation. Databricks recommends file arrival triggers for event-driven pipelines, and if you do use time-based streaming triggers, the guidance is to start at around 1 minute or higher. For very latency-sensitive use cases, Databricks also suggests considering the classic file notification mode, since managed file events add an extra caching hop that can increase latency slightly.
Hope this helps.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***