Databricks

Scouty · ‎02-10-2022

Hi

i'm using an autoloader with Azure Databricks:

df = (spark.readStream.format("cloudFiles")

.options(**cloudfile)

.load("abfss://dev@std******.dfs.core.windows.net/**/*****))

at my target checkpointLocation folder there are some files and subdirs created as a result.

It will detect and process new files which is OK.

Also when I restart my cluster it will again process only the new files, which is OK.

But if I want to restart the autoloader in order to re-process all files from the source folder again I could not find anything how to do so.

Can someone please give me a hint.

AmanSehgal · ‎02-10-2022

Change the checkpoint location or delete the existing checkpoint location.

The new checkpoint location implies that previous stream has been abandoned and new stream has been started.

View solution in original post

AmanSehgal · ‎02-10-2022

Change the checkpoint location or delete the existing checkpoint location.

The new checkpoint location implies that previous stream has been abandoned and new stream has been started.

Anonymous · ‎02-10-2022

@Aman Sehgal - My name is Piper, and I'm one of the moderators for Databricks. I wanted to jump in real quick to thank you for being so generous with your knowledge. 🙂

Databricks

How to reset an autoloader?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI