cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to reset an autoloader?

Scouty
New Contributor

Hi

i'm using an autoloader with Azure Databricks:

df = (spark.readStream.format("cloudFiles")

   .options(**cloudfile)

   .load("abfss://dev@std******.dfs.core.windows.net/**/*****))

 at my target checkpointLocation folder there are some files and subdirs created as a result.

It will detect and process new files which is OK.

Also when I restart my cluster it will again process only the new files, which is OK.

But if I want to restart the autoloader in order to re-process all files from the source folder again I could not find anything how to do so.

Can someone please give me a hint.

1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

Change the checkpoint location or delete the existing checkpoint location.

The new checkpoint location implies that previous stream has been abandoned and new stream has been started.

View solution in original post

2 REPLIES 2

AmanSehgal
Honored Contributor III

Change the checkpoint location or delete the existing checkpoint location.

The new checkpoint location implies that previous stream has been abandoned and new stream has been started.

Anonymous
Not applicable

@Aman Sehgal​ - My name is Piper, and I'm one of the moderators for Databricks. I wanted to jump in real quick to thank you for being so generous with your knowledge. 🙂

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.