- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 03:59 PM
Could you please provide me an idea how I can start reprocessing of my data?
Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline.
I want to reprocess the data + continue receive new data.
What the settings I have to set for that?
Do I need two "loads" or I can use one with Trigger.AvailableNow with setting of file limitation per batch?
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 09:13 PM
In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 09:13 PM
In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.

