Hello,
I have been reading databricks Auto Loader documentation about cloudFiles.backfillInterval configuration, and have a question about a specific detail on how it works still. I was only able to find examples of it being set to 1 day or 1 week. So I'm assuming you can enter any time in there such as x hours, x days, x weeks, x months. My question is how does it uses that 1 week to backfill.
Does it look at the lastModified time on the files arriving in the input directory that have not been processed and calculates currentTime - lastModified <= backfillInterval.
Or does it run once a week the backfill, so if I ran the databricks autoloader pipeline last week, it will perform a backfill? In that case the backfill might just look through all the files in the input directory and the cloud_file_state and make sure all have been processed?
I'm not getting a good picture of what exactly backfillInterval does? But it seems to be good, says it guarantees 100% of files to be processed.