Databricks Community

Sainath368 · Monday

Hi everyone,

We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.

We have a function that handles structured streaming where we are reading data from specific folder paths (per table) and writing it to a Delta staging table. Previously, we were polling directories for new files. Now, we're switching to using managed file events for event-driven processing.

Our use case involves processing multiple tables concurrently. We're leveraging multithreading to execute the streaming function for each table in parallel, with each table processed in a separate thread.

Here’s how we’re setting it up:

Multiple tables have their own directories (e.g., container_path/system/table1/, container_path/system/table2/, etc.).
Each table is processed with structured streaming using Auto Loader and managed file events (cloudFiles.useManagedFileEvents = true).
We are multithreading the processing, so each table runs in a separate thread, with each thread initializing its own stream for its specific table directory.

My question:

Given this setup, will Databricks create a separate event queue for each table (i.e., each stream)? In other words, for each stream running for a specific table, will there be an independent event queue that listens for file events for that particular table directory? We are processing multiple tables simultaneously, so we need to ensure that each table’s events are managed independently.

To summarize:

We are running one stream per table in parallel using multithreading.
Each stream listens to a specific directory (table) and writes data to a corresponding Delta staging table.
Managed file events are used to trigger the processing when new files are added.

Are multiple queues created automatically — one for each stream and corresponding table path — or does the system handle the event queuing in some other way?

Looking forward to your insights!

Thanks in advance!

szymon_dybczak · Monday

Hi @Sainath368 ,

File notification mode supports 2 modes:

- File events (recommended) - in this case you use aa single file notification queue for all streams that process files from a give external location.

- Legacy file notification mode - You manage file notification queues for each Auto Loader stream separately. Auto Loader automatically sets up a notification service and queue service that subscribes to file events from the input directory.

So, in your case you're using File Events so you will have a single queue for all streams that process files from a given external location.

Here you can read more about it:

Configure Auto Loader streams in file notification mode - Azure Databricks | Microsoft Learn

Raman_Unifeye · Monday

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.

As you are running - 1 stream per table, 1 unique directory per stream and 1 unique checkpoint per stream

The result is multiple, independent event queues—one tied to each stream/table path

RG #Driving Business Outcomes with Data Intelligence

Sainath368 · Tuesday

Hi @Raman_Unifeye @szymon_dybczak , Considering my setup which i already explained, does it make any difference between ManagedFileEvents and Legacy file notification mode ? if yes in what aspects.

Databricks Community

Migrating from directory-listing to Autoloader Managed File events

My question:

Join Us as a Local Community Builder!

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐