Autoloader clarification

Kanna · ‎01-20-2025

Hi team,

Good day! I would like to know how we can perform an incremental load using Autoloader.
I am uploading one file to DBFS and writing it into a table. When I upload a similar file to the same directory, it does not perform an incremental load; instead, I see duplicate rows in the final table where I am writing.

Below is the code I am using. Is there anything I am missing here?

df = spark.readStream\

.format('cloudFiles')\

.option("cloudFiles.format","csv")\

.option("cloudFiles.schemaLocation",f'{source_dir}/schemaInfer')\

.option('header','true')\

.load(source_dir)

(

df.writeStream.option("checkpointLocation", "dbfs:/FileStore/streamingwritetest/checkpointlocation1")

.outputMode("append")

.queryName("writestreamquery")

.toTable("stream.writestream")

)

File:

Country	Citizens
India	10
USA	5
China	10
India	10
Canada	40

Thank you!