Hi @kmorton, Databricks Auto Loader does support backfilling to capture any missed files with file notifications. This is achieved by using the cloudFiles.backfillInterval
option to schedule regular backfills over your data. However, it does not specify an option to set a "start_date" or "end_date" for the backfill operation.
As per your requirement, you want to ingest a massive file system, but not initially. However, the Auto Loader does not appear to have a direct setting or functionality to set a start date for the first significant ingestion.
You might have to manually manage the files you want to ingest initially and then use the backfill functionality to ingest older data slowly. This could involve moving or copying the files you want to ingest into a separate directory and pointing the Auto Loader to this directory.
Once this data has been ingested, you could point the Auto Loader to the older data's directory and use the backfill functionality to ingest this data over time.
Please note that this is just a suggested approach and may need to be adjusted based on your specific needs and environment.