Hi @saurabh18cs ,
You are making a very good decision by moving from directory listing to file notification mode. Regarding your questions:
Databricks recommends using one shared queue instead of creating one per autoloader stream. In fact, this is reflected in new file notification mode with file events. Because when we're talking about file notification mode it comes with 2 flavours:
- Legacy file notification mode - no longer recommended. In this mode you manage file notification queues for each Auto Loader stream separately. Auto Loader automatically sets up a notification service and queue service that subscribes to file events from the input directory.
- File notification mode with file events - you use a single Azure Databricks-managed file notification queue for all streams that process files from any given external location defined in Unity Catalog.
It has the following advantages over the legacy approach:- Azure Databricks can set up subscriptions and file events in your cloud storage account for you without requiring that you supply additional credentials to Auto Loader using a service credential or other cloud-specific authentication options.
- You have fewer Azure managed identity policies to create in your cloud storage account.
- Because you no longer need to create a queue for each Auto Loader stream, it's easier to avoid hitting the cloud provider notification limits listed in Cloud resources used in legacy Auto Loader file notification mode.
- Azure Databricks automatically manages the tuning of resource requirements, so you don't need to tune parameters such as cloudFiles.fetchParallelism.
- Cleanup functionality means that you don't need to worry as much about the lifecycle of notifications that are created in the cloud, such as when a stream is deleted or fully refreshed.
2) Regarding limitations, in legacy file notification mode the main one limitations - you can have only 500 queues per storage account can be easily overcome using recommend file notfication mode with file events.
Also from maintenance perspective is way worse because you need to take care of so many queues.
Configure Auto Loader streams in file notification mode - Azure Databricks | Microsoft Learn
In case of Auto Loader with file events mode - list of limitations is available at below links:
Configure Auto Loader streams in file notification mode - Azure Databricks | Microsoft Learn
Manage external locations - Azure Databricks | Microsoft Learn