cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Configure Autoloader with the file notification mode for production

Chris_Konsur
New Contributor III

I configured ADLS Gen2 standard storage and successfully configured Autoloader with the file notification mode.

In this document

https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html

"ADLS Gen2 provides different event notifications for files appearing in your Gen2 container.

        

Auto Loader listens for the FlushWithClose event for processing a file.     

Do I need to do anything with this FlushWithClose event or Autoloader 

when configured in the file notification mode=True, automatically listen to the FlushWithClose event REST API?

Next, in the same document,Databricks recommends triggering regular backfills with Auto Loader by using the cloudFiles.backfillInterval option guarantees that all files are discovered within a given SLA if data completeness is required. Triggering regular backfills does not cause duplicates.

From <https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html

 Finally, I found this article how to use Auto Loader Resources Manager Scala API 

https://www.mssqltips.com/sqlservertip/6965/databricks-auto-loader-manager/

Do you know if this Resource Mgr is available in Python?

1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Esteemed Contributor

Hi, @Chris Konsur​.

You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to.

As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely depends on your SLAs. The setting determines how often you should be doing a full reconciliation of the data that has been processed. I would also recommend checking our the incremental file listing as well.

As for the resource manager, I do not believe there is a Python version.

View solution in original post

2 REPLIES 2

Ryan_Chynoweth
Esteemed Contributor

Hi, @Chris Konsur​.

You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to.

As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely depends on your SLAs. The setting determines how often you should be doing a full reconciliation of the data that has been processed. I would also recommend checking our the incremental file listing as well.

As for the resource manager, I do not believe there is a Python version.

Excellent, thank you, Ryan!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group