cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Configure Autoloader with the file notification mode for production

Chris_Konsur
New Contributor III

I configured ADLS Gen2 standard storage and successfully configured Autoloader with the file notification mode.

In this document

https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html

"ADLS Gen2 provides different event notifications for files appearing in your Gen2 container.

        

Auto Loader listens for the FlushWithClose event for processing a file.     

Do I need to do anything with this FlushWithClose event or Autoloader 

when configured in the file notification mode=True, automatically listen to the FlushWithClose event REST API?

Next, in the same document,Databricks recommends triggering regular backfills with Auto Loader by using the cloudFiles.backfillInterval option guarantees that all files are discovered within a given SLA if data completeness is required. Triggering regular backfills does not cause duplicates.

From <https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html

 Finally, I found this article how to use Auto Loader Resources Manager Scala API 

https://www.mssqltips.com/sqlservertip/6965/databricks-auto-loader-manager/

Do you know if this Resource Mgr is available in Python?

1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Honored Contributor III

Hi, @Chris Konsur​.

You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to.

As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely depends on your SLAs. The setting determines how often you should be doing a full reconciliation of the data that has been processed. I would also recommend checking our the incremental file listing as well.

As for the resource manager, I do not believe there is a Python version.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Chris Konsur​, There is a thread with a similar issue here on our community- https://community.databricks.com/s/feed/0D58Y00009MHGcQSAX

Please check it out and let us know if that does not help.

Ryan_Chynoweth
Honored Contributor III

Hi, @Chris Konsur​.

You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to.

As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely depends on your SLAs. The setting determines how often you should be doing a full reconciliation of the data that has been processed. I would also recommend checking our the incremental file listing as well.

As for the resource manager, I do not believe there is a Python version.

Excellent, thank you, Ryan!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.