Re: Delta Live Tables has duplicates created by mu...

-werners- · ‎02-27-2022

Hm hard to tell. You use a mix of pyspark and python objects, perhaps that is the reason as some will be executed on the driver and others over the workers.

Can I ask why you use the toLocalIterator and the append as a list (df_append) which you then reduce with functools?