Databricks Community

Eldar_Dragomir · ‎08-16-2023

Could you please provide me an idea how I can start reprocessing of my data?
Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline.
I want to reprocess the data + continue receive new data.
What the settings I have to set for that?
Do I need two "loads" or I can use one with Trigger.AvailableNow with setting of file limitation per batch?

Tharun-Kumar · ‎08-16-2023

@Eldar_Dragomir

In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.

View solution in original post

Tharun-Kumar · ‎08-16-2023

@Eldar_Dragomir

In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.

Databricks Community

Reprocessing the data with Auto Loader

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples