โ09-03-2024 10:31 PM
Autoloader Checkpoint Fails and then the after changing the checkpoint path need to reload all data. I want to load all the data which are not processed . I don't want to relaod all the data.
โ09-03-2024 10:53 PM
Hi @Subhasis ,
Could you provide more details? Like exact error message, you autoloader configuration etc. It will be easier for us to help you
โ09-03-2024 10:54 PM
@Subhasis
What do you exactly mean by "Autoloader Checkpoint fails"?
How did you change the checkpoint path? Simply by specifying a new path?
If yes, then it's normal that it will try to reload all data, as it sees that the checkpoint is empty thus it's trying to load everything it finds.
What you could do is to specify modifiedAfter option to set up a cutoff date.
โ09-03-2024 11:28 PM
No such error is showing it is reading file but it is not writing the data into deltatable. Then when I identified it is not writing data I created a new checkpoint path. Then it is reloading all the data. How to avoid this situation . Modified after I used but then not able to identify from when the data is not writing since the job is not failing as such.
โ09-03-2024 11:30 PM
Do checkpoint has some benchmark capacity after that it stops writing data?
โ09-03-2024 11:32 PM - edited โ09-03-2024 11:36 PM
You can use cloud_files_state function to see what files has been processed by autoloader and saved in checkpoint.
I'm assuming that in your case you have some misconfiguration that's causing a problem
cloud_files_state table-valued function | Databricks on AWS
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group