Autoloader clarification

Kanna
New Contributor II

Hi team,

Good day! I would like to know how we can perform an incremental load using Autoloader.
I am uploading one file to DBFS and writing it into a table. When I upload a similar file to the same directory, it does not perform an incremental load; instead, I see duplicate rows in the final table where I am writing.

Below is the code I am using. Is there anything I am missing here?

df = spark.readStream\
        .format('cloudFiles')\
        .option("cloudFiles.format","csv")\
        .option("cloudFiles.schemaLocation",f'{source_dir}/schemaInfer')\
        .option('header','true')\
        .load(source_dir)

 

(
    df.writeStream.option("checkpointLocation", "dbfs:/FileStore/streamingwritetest/checkpointlocation1")
    .outputMode("append")
    .queryName("writestreamquery")
    .toTable("stream.writestream")
)
 
File: 
 
CountryCitizens
India10
USA5
China10
India10
Canada40

Thank you!