Hi Aziz,
What youāre seeing is an expected behaviour when combining Spark retries with non-idempotent writes.
Sparkās write path is task-based and fault-tolerant. If a task fails part-way through writing to MongoDB, Spark will retry that task.
From Sparkās perspective this is correct behaviour, but MongoDB has no idea itās a āretryā, it just sees another insert.
If, at the time of the write:
then the same row can be written twice when a task is retried.
Because your Delta source is clean and deduplicated, the issue isnāt in Delta, itās the at-least-once semantics of the write path.
To fix it:
Introduce a unique, deterministic key and write via upserts / _id so retries are safe, or
Use a staging collection + atomic rename so partial/duplicate states never hit your live collection.
Either approach will eliminate duplicates even when Spark tasks are retried.