09-26-2025 08:21 AM
Hello All,
We have started to consume source messages/files via autoloader directory listing mode at the moment and want to convert this to file notification mode instead so consumption can be faster with no more entire directories/folder scanning. I am looking for following clarifications:
1) when we have multiple jobs processing files at the same time where we also have to consider latency and throughput in mind for these jobs, what could be the ideal solution to use a shared queue on storage account or multiple queues?
2) do we forsee any limitations on how many files can get process with this setup every second/minute?
thank you
09-27-2025 01:54 AM - edited 09-27-2025 01:55 AM
Hello @saurabh18cs !
You don’t need to choose queues and simply use the “File events” path. When enabled, Databricks uses one managed queue per external location (Unity Catalogue), and all your streams that read from that location share it. This avoids cloud queue limits, and Databricks auto-tunes the notification plumbing.
Doc: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-notification-mode...
If you stay in the current file notification mode, the default pattern is one queue per stream, and Databricks recommends fan-out only when you’re constrained by provider limits
I think throughput has no fixed per-second cap. By default, it processes up to 1,000 files per micro-batch; tune with cloudFiles.maxFilesPerTrigger
Please let me know if you have any further questions.
09-27-2025 02:02 AM - edited 09-27-2025 02:05 AM
Hi @saurabh18cs ,
You are making a very good decision by moving from directory listing to file notification mode. Regarding your questions:
Databricks recommends using one shared queue instead of creating one per autoloader stream. In fact, this is reflected in new file notification mode with file events. Because when we're talking about file notification mode it comes with 2 flavours:
2) Regarding limitations, in legacy file notification mode the main one limitations - you can have only 500 queues per storage account can be easily overcome using recommend file notfication mode with file events.
Also from maintenance perspective is way worse because you need to take care of so many queues.
Configure Auto Loader streams in file notification mode - Azure Databricks | Microsoft Learn
In case of Auto Loader with file events mode - list of limitations is available at below links:
Configure Auto Loader streams in file notification mode - Azure Databricks | Microsoft Learn
Manage external locations - Azure Databricks | Microsoft Learn
09-29-2025 01:32 AM
Hi @K_Anudeep @szymon_dybczak how do i understand a situation when 100 jobs are running in parallel with minimal latency needed. does autoloader directly connect to the cloud queue service ? or databricks stores and manages detected files somewhere? any azure cloud limit in this design?
Br
09-29-2025 01:41 AM
Hi @saurabh18cs ,
Since Auto Loader with file notification mode uses Azure Storage Queue hence it is subject to the limitations related to Storage Queue. Below you can find them:
Scalability and performance targets for Queue Storage - Azure Storage | Microsoft Learn
From my experiance queues scale really well, but you need to test it on your environment
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now