cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader start and end date for ingestion

kmorton
New Contributor

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date that I want to perform the first big ingestion to populate the most recent data into my database and then over time slowly backfill the older data. Is this functionality currently in the autoloader settings, and if not, any suggestions on how to approach this issue?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kmortonDatabricks Auto Loader does support backfilling to capture any missed files with file notifications. This is achieved by using the cloudFiles.backfillInterval option to schedule regular backfills over your data. However, it does not specify an option to set a "start_date" or "end_date" for the backfill operation. 

As per your requirement, you want to ingest a massive file system, but not initially. However, the Auto Loader does not appear to have a direct setting or functionality to set a start date for the first significant ingestion.

You might have to manually manage the files you want to ingest initially and then use the backfill functionality to ingest older data slowly. This could involve moving or copying the files you want to ingest into a separate directory and pointing the Auto Loader to this directory.

Once this data has been ingested, you could point the Auto Loader to the older data's directory and use the backfill functionality to ingest this data over time. 

Please note that this is just a suggested approach and may need to be adjusted based on your specific needs and environment.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group