- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-23-2024 01:37 PM - edited 07-23-2024 01:41 PM
Hi @8b1tz ,
Glad that it worked for you. You don't have to run it continuously, you can ran it as batch jobs with Trigger.AvailableNow (look at below link, cost consideration sections):
Configure Auto Loader for production workloads | Databricks on AWS
As of event grid part, read about File Notification Mode in Autoloader (or watch below video). In short, this mode is recommended to efficiently ingest large amount of data.
In file notification mode, Auto Loader automatically (you can set it manually if you prefer) sets up a notification service (Event Grid) and queue service (Storage Queue) that subscribes to file events from the input directory.
So it works like this, new file arrives on your storage then event grid sends information about new file to storage queue. Finally, autoloader checks if there are new files at storage queue to process. If auto loader succesfully processed data it empties the queue and saves those information in checkpoint.
Auto loader will combine all new data into target table, so in each run it will load only new data.
Az Databricks # 28:- Autoloader in Databricks || File Notification mode (youtube.com)