Yes, Azure Databricks Auto Loader with Databricks-managed file notification mode for external locations in Unity Catalog has been successfully implemented by users, especially since it entered public preview in 2025, and it's designed to make file discovery and event-driven ingestion easier for cloud data engineers.โ
Steps for Setup
-
Workspace Requirements
You need an Azure Databricks workspace with Unity Catalog enabled, and you must be able to create storage credential and external location objects in Unity Catalog.โ
-
Create Credentials and External Location
-
In Unity Catalog, create a storage credential that grants Databricks access to your source cloud storage.
-
Register an external location pointing to your cloud storage path.
-
Enable Managed File Events
-
Enable file events for the external location using Databricks; this reduces the need for multiple queues and lessens IAM complexity.
-
For each Auto Loader stream, set the option cloudFiles.useManagedFileEvents to true in your Spark configuration (or use useManagedFileEvents => 'True' for declarative pipelines).
-
Permissions
-
The executing user or cluster/service principal must have READ FILES permissions on the external location, plus permissions to create external locations and storage credentials in Unity Catalog.โ
IAM and Unity Catalog Configuration
-
IAM Policies
-
Fewer managed identity policies are required compared to legacy notification mode. You typically just need one managed identity configured for your external location, and Databricks will set up the necessary event subscriptions automatically.โ
-
Unity Catalog
-
Must register the external location with Unity Catalog and grant the right permissions (usually at least READ FILES for Auto Loader ingestion).
-
Checkpoint and schema storage should be linked to Unity Catalog-managed cloud storage locations.โ
Limitations
-
Runtime Requirement
-
You must run Databricks Runtime 14.3 LTS or later for the managed file notification mode.โ
-
Unsupported Features
-
Certain legacy settings are ignored: manual parallelism (cloudFiles.fetchParallelism), useNotifications, useIncremental, cloudFiles.pathRewrites, and cloudFiles.backfillInterval.โ
-
Frequency of Job Runs
-
File event caches expire after about seven days; if the stream isn't invoked within that window, Auto Loader may fall back to directory listing, losing some efficiency gains.โ
-
Source Path Changes
-
Changing the source path in file notification mode is unsupported; doing so may cause ingestion failures for files already present at the new path.โ
-
Not Supported for Premium Storage
-
Azure Premium Storage accounts aren't compatible because they lack queue storage needed for notifications.โ
Best Practices and Lessons Learned
-
Run Streams Frequently
-
Run your stream at least once every seven days to prevent cache expiry.โ
-
Leverage Automatic Resource Management
-
Let Databricks manage parallelism and backfill settings; manual tuning is not needed and isn't respected in this mode.โ
-
Clean Up If Migrating
-
If you migrate from legacy notification mode, switch off and delete old queues and notification resources from each existing Auto Loader stream before activating managed file events.โ
-
Monitor Permissions
-
Ensure your Unity Catalog and managed identity permissions are always up to date, especially if multiple teams share datasets.
Summary Table
| Step |
Unity Catalog/IAM Action |
Limitation |
| Create storage credential |
Must have create permissions |
|
| Register external location |
Grant READ FILES |
|
| Enable managed file events |
Reduce IAM complexity, one queue |
Requires Databricks Runtime 14.3+ |
| Configure Auto Loader stream |
Use cloudFiles.useManagedFileEvents=true |
Some legacy settings ignored |
| Clean up legacy notification resources |
Remove old queues if migrating |
Donโt change source path |
| Run stream frequently |
|
Cache expires after 7 days |
This mode is significantly simpler and more performant compared to the older per-stream notification model, with fewer maintenance tasks once configuration is finished.โ