cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader Checkpoint Fails and then the after changing the checkpoint path need to reload all data

Subhasis
New Contributor II

Autoloader Checkpoint Fails and then the after changing the checkpoint path need to reload all data. I want to load all the data which are not processed . I don't want to relaod all the data.

5 REPLIES 5

szymon_dybczak
Contributor III

Hi @Subhasis ,

Could you provide more details? Like exact error message, you autoloader configuration etc. It will be easier for us to help you

daniel_sahal
Esteemed Contributor

@Subhasis 
What do you exactly mean by "Autoloader Checkpoint fails"?
How did you change the checkpoint path? Simply by specifying a new path?
If yes, then it's normal that it will try to reload all data, as it sees that the checkpoint is empty thus it's trying to load everything it finds. 
What you could do is to specify modifiedAfter option to set up a cutoff date.

Subhasis
New Contributor II

No such error is showing it is reading file but it is not writing the data into deltatable. Then when I identified it is not writing data I created a new checkpoint path. Then it is reloading all the data. How to avoid this situation . Modified after I used but then not able to identify from when the data is not writing since the job is not failing as such.

 

Subhasis
New Contributor II

Do checkpoint has some benchmark capacity after that it stops writing data?

 

You can use cloud_files_state function to see what files has been processed by autoloader and saved in checkpoint.

I'm assuming that in your case you have some misconfiguration that's causing a problem

cloud_files_state table-valued function | Databricks on AWS

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group