02-10-2022 02:21 AM
Hi
i'm using an autoloader with Azure Databricks:
df = (spark.readStream.format("cloudFiles")
.options(**cloudfile)
.load("abfss://dev@std******.dfs.core.windows.net/**/*****))
at my target checkpointLocation folder there are some files and subdirs created as a result.
It will detect and process new files which is OK.
Also when I restart my cluster it will again process only the new files, which is OK.
But if I want to restart the autoloader in order to re-process all files from the source folder again I could not find anything how to do so.
Can someone please give me a hint.
02-10-2022 04:58 AM
Change the checkpoint location or delete the existing checkpoint location.
The new checkpoint location implies that previous stream has been abandoned and new stream has been started.
02-10-2022 04:58 AM
Change the checkpoint location or delete the existing checkpoint location.
The new checkpoint location implies that previous stream has been abandoned and new stream has been started.
02-10-2022 06:55 AM
@Aman Sehgal - My name is Piper, and I'm one of the moderators for Databricks. I wanted to jump in real quick to thank you for being so generous with your knowledge. 🙂
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.