Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2024 07:19 AM - last edited on 03-08-2024 10:25 AM by Retired_mod
Hi,
It only states the reasons why I am confused by the observed behaviour in my job:
- "Only 3 countries starting in parallel: This occurs because Spark assigns one worker per partition. If you have more partitions (countries) than workers, some workers remain idle."
--> If Spark assigns one worker per partition and I have more partitions than workers, why would a worker remain idle? - "Waiting for all workers to finish: Since Spark waits for all tasks (countries) to complete before proceeding, the largest country’s processing time affects the overall job completion."
--> Yes, my intention was that the overall job completion depends on the largest country. However, why are the partitions blocked from executing during the same function call?
I would appreciate if someone could actually read through the post and give me feedback if there is anything I can do.