SQS messages disappear immediately when File Events enabled on External Location with pre-provisione
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-09-2026 03:30 AM
Environment
- Cloud: AWS
- Unity Catalog: Enabled
- Auto Loader Mode: File Notification (Legacy) with pre-provisioned SQS
Problem
We are using Auto Loader in legacy file notification mode with a pre-provisioned SQS queue (cloudFiles.useNotifications = true + cloudFiles.queueUrl).The architecture is:
S3 (s3:ObjectCreated:*) → SNS Topic → SQS Queue → Auto Loader
The S3 bucket publishes s3:ObjectCreated:* events to an SNS topic, which fans out to our SQS queue. Auto Loader consumes from this SQS queue.
When we enable File Events on the External Location (Unity Catalog) pointing to the same S3 path, SQS messages start disappearing within seconds of arrival — even though our Auto Loader job is not running at that moment.
SQS Configuration
- Queue type: Standard
- Visibility timeout: 1 hour
- Message retention period: 4 days (default)
- No other consumers configured on this queue — only Databricks
Observed Behavior
| File Events disabled on External Location | Messages arrive in SQS and remain visible as expected. |
| File Events enabled on External Location (with same pre-provisioned SQS) | Messages arrive in SQS but disappear within a few seconds. No consumer is running — our Auto Loader job is not active at that time. ❌ |
| File Events disabled again + remove SQS config | Messages return to normal behavior — arrive and stay in the queue. ✅ |
Impact
This led to data loss in our pipeline. New files pushed to S3 generated SQS messages, but those messages were consumed and deleted before our Auto Loader job ran. When the job eventually triggered (daily schedule), the SQS messages were already gone — so Auto Loader saw no new events and did not ingest the new files into the target table.
What We Understand
From the documentation (Auto Loader with file events overview😞
- The Databricks File Events service listens to file events and caches file metadata.
- Databricks uses the permissions from the storage credential to read and delete messages from the queue.
- There is only one queue and storage event subscription per external location.
Questions
- Why do SQS messages disappear within seconds even though no Databricks job is running? When File Events is enabled on the External Location, something is consuming and deleting messages from our pre-provisioned SQS queue — but our Auto Loader job is scheduled daily and was not active at that time. What background process is consuming these messages?
- Why doesn't the next Auto Loader job run process those events? If the Databricks File Events service consumed the SQS messages to build its internal cache, shouldn't Auto Loader with cloudFiles.useNotifications = true + cloudFiles.queueUrl still be able to discover those files on the next run — either from the cache or from the queue?
Thank you