cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader start and end date for ingestion

kmorton
New Contributor

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date that I want to perform the first big ingestion to populate the most recent data into my database and then over time slowly backfill the older data. Is this functionality currently in the autoloader settings, and if not, any suggestions on how to approach this issue?

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

If the files have already been loaded by autoloader (like same name and path), this can be tricky.

I recommend starting a separate autoloader stream and specifying filters on it to match your start and end dates. If you'd instead like to rely on the modification timestamps of the files, you can use the modifiedBefore and modifiedAfter options.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group