- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2026 03:42 AM
Hi @JIWON ,
1. There is no such option;
2. Assuming that the job is triggered every hour, the spikes every 8-hours can be explained by this:
To ensure eventual completeness of data in auto mode, Auto Loader automatically triggers a full directory list after completing 7 consecutive incremental lists. You can control the frequency of full directory lists by setting cloudFiles.backfillInterval to trigger asynchronous backfills at a given interval.
3. So, if you want to reduce / increase the full scan frequency, you can set up an interval with the cloudFiles.backfillInterval option, for example .option("cloudFiles.backfillInterval", "1 week"). Just bear in mind that the full listing is needed to include any missed files, so doing it more rarely means that there will be potentially some missed data.
Hope it helps.
P.S. Really curious to understand your requirements for real-time which are not compatible with the File events mode. You would still be able to run job every hour (and not in real-time) with File events mode.
Best regards,