Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2022 11:34 PM
Hm hard to tell. You use a mix of pyspark and python objects, perhaps that is the reason as some will be executed on the driver and others over the workers.
Can I ask why you use the toLocalIterator and the append as a list (df_append) which you then reduce with functools?