Reprocessing the data with Auto Loader

Eldar_Dragomir — Wed, 16 Aug 2023 22:59:38 GMT

Could you please provide me an idea how I can start reprocessing of my data?
Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline.
I want to reprocess the data + continue receive new data.
What the settings I have to set for that?
Do I need two "loads" or I can use one with Trigger.AvailableNow with setting of file limitation per batch?

Re: Reprocessing the data with Auto Loader

Tharun-Kumar — Thu, 17 Aug 2023 04:13:30 GMT

@Eldar_Dragomir

In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.

topic Re: Reprocessing the data with Auto Loader in Data Engineering

Reprocessing the data with Auto Loader

Re: Reprocessing the data with Auto Loader