cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader start and end date for ingestion

kmorton
New Contributor

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date that I want to perform the first big ingestion to populate the most recent data into my database and then over time slowly backfill the older data. Is this functionality currently in the autoloader settings, and if not, any suggestions on how to approach this issue?

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

If the files have already been loaded by autoloader (like same name and path), this can be tricky.

I recommend starting a separate autoloader stream and specifying filters on it to match your start and end dates. If you'd instead like to rely on the modification timestamps of the files, you can use the modifiedBefore and modifiedAfter options.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now