Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2025 09:50 PM
Hi team,
Good day! I would like to know how we can perform an incremental load using Autoloader.
I am uploading one file to DBFS and writing it into a table. When I upload a similar file to the same directory, it does not perform an incremental load; instead, I see duplicate rows in the final table where I am writing.
Below is the code I am using. Is there anything I am missing here?
df = spark.readStream\
.format('cloudFiles')\
.option("cloudFiles.format","csv")\
.option("cloudFiles.schemaLocation",f'{source_dir}/schemaInfer')\
.option('header','true')\
.load(source_dir)
(
df.writeStream.option("checkpointLocation", "dbfs:/FileStore/streamingwritetest/checkpointlocation1")
.outputMode("append")
.queryName("writestreamquery")
.toTable("stream.writestream")
)
File:
| Country | Citizens |
| India | 10 |
| USA | 5 |
| China | 10 |
| India | 10 |
| Canada | 40 |
Thank you!