-werners-
Esteemed Contributor III

Hm hard to tell. You use a mix of pyspark and python objects, perhaps that is the reason as some will be executed on the driver and others over the workers.

Can I ask why you use the toLocalIterator and the append as a list (df_append) which you then reduce with functools?