cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

saicharandeepb
New Contributor III

Hi everyone,
I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and Iโ€™d love to hear from anyone who has already implemented it.

  • Have you successfully set this up?
  • What steps did you follow to enable managed file notifications for an external location?
  • Were there any specific IAM or Unity Catalog configurations required?
  • What limitations should I be aware of while using this mode?
  • Any best practices or lessons learned you can share?
1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Yes, Azure Databricks Auto Loader with Databricks-managed file notification mode for external locations in Unity Catalog has been successfully implemented by users, especially since it entered public preview in 2025, and it's designed to make file discovery and event-driven ingestion easier for cloud data engineers.โ€‹

Steps for Setup

  • Workspace Requirements
    You need an Azure Databricks workspace with Unity Catalog enabled, and you must be able to create storage credential and external location objects in Unity Catalog.โ€‹

  • Create Credentials and External Location

    • In Unity Catalog, create a storage credential that grants Databricks access to your source cloud storage.

    • Register an external location pointing to your cloud storage path.

  • Enable Managed File Events

    • Enable file events for the external location using Databricks; this reduces the need for multiple queues and lessens IAM complexity.

    • For each Auto Loader stream, set the option cloudFiles.useManagedFileEvents to true in your Spark configuration (or use useManagedFileEvents => 'True' for declarative pipelines).

  • Permissions

    • The executing user or cluster/service principal must have READ FILES permissions on the external location, plus permissions to create external locations and storage credentials in Unity Catalog.โ€‹

IAM and Unity Catalog Configuration

  • IAM Policies

    • Fewer managed identity policies are required compared to legacy notification mode. You typically just need one managed identity configured for your external location, and Databricks will set up the necessary event subscriptions automatically.โ€‹

  • Unity Catalog

    • Must register the external location with Unity Catalog and grant the right permissions (usually at least READ FILES for Auto Loader ingestion).

    • Checkpoint and schema storage should be linked to Unity Catalog-managed cloud storage locations.โ€‹

Limitations

  • Runtime Requirement

    • You must run Databricks Runtime 14.3 LTS or later for the managed file notification mode.โ€‹

  • Unsupported Features

    • Certain legacy settings are ignored: manual parallelism (cloudFiles.fetchParallelism), useNotifications, useIncremental, cloudFiles.pathRewrites, and cloudFiles.backfillInterval.โ€‹

  • Frequency of Job Runs

    • File event caches expire after about seven days; if the stream isn't invoked within that window, Auto Loader may fall back to directory listing, losing some efficiency gains.โ€‹

  • Source Path Changes

    • Changing the source path in file notification mode is unsupported; doing so may cause ingestion failures for files already present at the new path.โ€‹

  • Not Supported for Premium Storage

    • Azure Premium Storage accounts aren't compatible because they lack queue storage needed for notifications.โ€‹

Best Practices and Lessons Learned

  • Run Streams Frequently

    • Run your stream at least once every seven days to prevent cache expiry.โ€‹

  • Leverage Automatic Resource Management

    • Let Databricks manage parallelism and backfill settings; manual tuning is not needed and isn't respected in this mode.โ€‹

  • Clean Up If Migrating

    • If you migrate from legacy notification mode, switch off and delete old queues and notification resources from each existing Auto Loader stream before activating managed file events.โ€‹

  • Monitor Permissions

    • Ensure your Unity Catalog and managed identity permissions are always up to date, especially if multiple teams share datasets.

Summary Table

Step Unity Catalog/IAM Action Limitation
Create storage credential Must have create permissions  
Register external location Grant READ FILES  
Enable managed file events Reduce IAM complexity, one queue Requires Databricks Runtime 14.3+
Configure Auto Loader stream Use cloudFiles.useManagedFileEvents=true Some legacy settings ignored
Clean up legacy notification resources Remove old queues if migrating Donโ€™t change source path
Run stream frequently   Cache expires after 7 days
 
 

This mode is significantly simpler and more performant compared to the older per-stream notification model, with fewer maintenance tasks once configuration is finished.โ€‹

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now