cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Roll back to previous version of an AutoLoader checkpoint file

DomDuf
New Contributor II

I know to "reset" AutoLoader, you can delete the checkpoint file entirely. I was wondering if it's possible to and how would someone :

  • Get the checkpoint file to a previous version so I can reload certain files that were already processed
  • Delete certain rows in the checkpoint file (by creation date for example)

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

Murthy1
Contributor II

Hello @Dominic Dufourโ€‹ ,

I had the same question and Unfortunately it is not possible yet.

However you can delete the checkpoint file and use the Autoloader option - "modifiedAfter" to pick up the files after a specific time. This is a one time activity as the checkpoint will be created again to continue for the future loads. You can remove the "modifiedAfter" once your data loads are back on track.

The other work around I can think of is a bit intensive one, You can have some scheduled jobs to copy the checkpoint files to a different location frequently (with date time markers- probably in the folder name). You can restore the checkpoint files from that location when you want to reset Autoloader.

View solution in original post

3 REPLIES 3

Murthy1
Contributor II

Hello @Dominic Dufourโ€‹ ,

I had the same question and Unfortunately it is not possible yet.

However you can delete the checkpoint file and use the Autoloader option - "modifiedAfter" to pick up the files after a specific time. This is a one time activity as the checkpoint will be created again to continue for the future loads. You can remove the "modifiedAfter" once your data loads are back on track.

The other work around I can think of is a bit intensive one, You can have some scheduled jobs to copy the checkpoint files to a different location frequently (with date time markers- probably in the folder name). You can restore the checkpoint files from that location when you want to reset Autoloader.

DomDuf
New Contributor II

Thanks for the quick answer, this was really helpful !

MRTN
New Contributor III

This would for sure be a useful feature.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group