โ03-15-2023 02:52 AM
I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.
I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection of the files and therefor it seems to me that this is an effect of the no-100% guarantee of file notification system.
Now I am wondering what the option 'cloudfiles.backfillInterval' will actually do as I find the documentation ambiguous.
Will `cloudfiles.backfillInterval':
PS: When looking at the cloud_files-listing I do not get any discovery_times, I suppose these are only relevant in file notification mode?
โ03-15-2023 05:52 AM
Hi @Fabrice Deseynโ , the backFillInterval option is to make sure that eventually all the files get processed. The backfill does not work on the new files. All the new files are processed as per your configuration of the directory listing or the file notification mode. Since there is no 100% guarantee that all files will be processed, the backfill process runs asynchronously to pick up any old files that have not been processed. Using backFillinterval, you can control how the old files will be processed.
I would also suggest using either file notification mode or incremental listing for better performance.
โ03-15-2023 05:52 AM
Hi @Fabrice Deseynโ , the backFillInterval option is to make sure that eventually all the files get processed. The backfill does not work on the new files. All the new files are processed as per your configuration of the directory listing or the file notification mode. Since there is no 100% guarantee that all files will be processed, the backfill process runs asynchronously to pick up any old files that have not been processed. Using backFillinterval, you can control how the old files will be processed.
I would also suggest using either file notification mode or incremental listing for better performance.
โ03-15-2023 06:06 AM
Hi @Lakshay Goelโ ,
So to make sure I correctly understood your answer (see snippet below):
Since there is no 100% guarantee that all files will be processed, the backfill process runs asynchronously to pick up any old files that have not been processed. Using backFillinterval, you can control how the old files will be processed.
only the old files that have not been processed will be processed?
โ03-15-2023 06:09 AM
Yes, that is correct
โ10-23-2023 03:02 AM
Hi @Lakshay Goelโ ,
where can I set the backFillInterval property in the code? Do you have any sample codes for this use case?
โ11-27-2023 05:36 AM
You do it when you read the files as .option("cloudFiles.backfillInterval", "1 week")