02-09-2026 03:30 AM
Problem
We are using Auto Loader in legacy file notification mode with a pre-provisioned SQS queue (cloudFiles.useNotifications = true + cloudFiles.queueUrl).The architecture is:
S3 (s3:ObjectCreated:*) → SNS Topic → SQS Queue → Auto Loader
The S3 bucket publishes s3:ObjectCreated:* events to an SNS topic, which fans out to our SQS queue. Auto Loader consumes from this SQS queue.
When we enable File Events on the External Location (Unity Catalog) pointing to the same S3 path, SQS messages start disappearing within seconds of arrival — even though our Auto Loader job is not running at that moment.
| File Events disabled on External Location | Messages arrive in SQS and remain visible as expected. |
| File Events enabled on External Location (with same pre-provisioned SQS) | Messages arrive in SQS but disappear within a few seconds. No consumer is running — our Auto Loader job is not active at that time. ❌ |
| File Events disabled again + remove SQS config | Messages return to normal behavior — arrive and stay in the queue. ✅ |
This led to data loss in our pipeline. New files pushed to S3 generated SQS messages, but those messages were consumed and deleted before our Auto Loader job ran. When the job eventually triggered (daily schedule), the SQS messages were already gone — so Auto Loader saw no new events and did not ingest the new files into the target table.
From the documentation (Auto Loader with file events overview😞
Thank you
02-12-2026 06:26 AM
Hi,
This seems to be an issue, but could you check the cloudfilestate metrics, especially when the file was created, discovered and processed, and check whether the autoloader job was running at that time.
May I know if you tried using managed file events and are facing the same issue?
2 weeks ago
Hi @truongtran,
Thank you for the thorough write-up with the environment details and reproducible scenarios -- that makes it much easier to pinpoint what is happening.
WHAT IS HAPPENING
When you enable File Events on an External Location in Unity Catalog, the Databricks File Events service starts actively consuming and deleting messages from the SQS queue associated with that location. As the documentation states, Databricks uses the permissions from the storage credential to "read and delete messages from the queue."
This is the background process you are seeing: the File Events service itself is the consumer that is draining your pre-provisioned SQS queue, even when your Auto Loader job is not running. The File Events service runs continuously as a managed Databricks service -- it does not depend on your Auto Loader stream being active.
WHY YOUR AUTO LOADER JOB MISSES THE FILES
Here is the sequence of events that leads to the data loss you observed:
1. A new file lands in S3, triggering an S3 event notification to your SNS topic.
2. The SNS topic delivers the message to your pre-provisioned SQS queue.
3. The Databricks File Events service (enabled on the External Location) reads and deletes the message from that SQS queue, caching the file metadata internally.
4. When your daily Auto Loader job runs using legacy file notification mode (cloudFiles.useNotifications = true + cloudFiles.queueUrl), it polls the SQS queue -- but the messages are already gone.
5. Because your job uses legacy mode (not managed file events), it does not read from the File Events cache. It only knows about the SQS queue, which is now empty.
The fundamental problem is that legacy file notification mode and managed File Events are two separate file discovery mechanisms, and they conflict when pointed at the same SQS queue. The File Events service consumes the messages that your legacy Auto Loader job expects to find.
THE ROOT CAUSE: TWO COMPETING CONSUMERS
The Databricks documentation on Auto Loader options explicitly states that these options are mutually exclusive:
- cloudFiles.useNotifications / cloudFiles.queueUrl (legacy file notification mode)
- cloudFiles.useManagedFileEvents (managed file events mode)
When you enable File Events on the External Location, you are activating the managed file events infrastructure. But your Auto Loader job is still configured for legacy mode. The result is two competing consumers on the same queue: the File Events service wins the race because it runs continuously, while your job only runs once daily.
HOW TO RESOLVE THIS
You have two options:
Option 1: Switch to Managed File Events (Recommended)
This is the recommended path going forward. Reconfigure your Auto Loader job to use managed file events instead of legacy file notification mode:
spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "your_format") \
.option("cloudFiles.useManagedFileEvents", "true") \
.load("s3://your-bucket/your-path/")
Remove the cloudFiles.useNotifications and cloudFiles.queueUrl options entirely. With this configuration, Auto Loader reads from the File Events cache instead of polling SQS directly. This approach:
- Requires Databricks Runtime 14.3 LTS or higher
- Requires File Events to be enabled on the External Location (which you already have)
- Does not require you to manage your own SQS queue
- Supports all Auto Loader streams on the same bucket with a single queue
Important: the file events cache holds metadata for files modified in the last 7 days, so you should run Auto Loader at least once every 7 days. For your daily schedule, this is not an issue.
Note that on the very first run after switching, Auto Loader will perform a full directory listing to get current with the file events cache, so it should pick up any files that were missed.
Option 2: Keep Legacy Mode and Disable File Events on the External Location
If you need to stay on legacy file notification mode for now, disable File Events on the External Location to stop the managed service from consuming your SQS messages. This is what you observed working in your third test scenario.
With this option, your existing architecture (S3 -> SNS -> SQS -> Auto Loader) continues to work as before.
RECOVERING MISSED DATA
If you have already lost SQS messages and need to catch up on missed files, you can do a one-time backfill. One approach:
spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "your_format") \
.option("cloudFiles.useManagedFileEvents", "true") \
.option("cloudFiles.includeExistingFiles", "true") \
.load("s3://your-bucket/your-path/")
Setting includeExistingFiles to true on the first run triggers a full directory listing, which will discover all existing files regardless of whether SQS messages were consumed.
REFERENCES
- Auto Loader with file events overview: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-events-explained
- Configure Auto Loader in file notification mode: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-notification-mode
- Auto Loader configuration options: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options
- Manage external locations (File Events): https://docs.databricks.com/aws/en/connect/unity-catalog/manage-external-locations
I hope this helps clarify the behavior you observed. The key takeaway is that enabling File Events on an External Location introduces a background consumer for SQS messages, which conflicts with legacy file notification mode. Switching to managed file events (Option 1) is the cleanest resolution and aligns with the recommended approach going forward.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
2 weeks ago
Really appreciate your response