Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2022 05:59 PM
if a python process does not use spark, such as pandas (not spark pandas), only one node is used. I ran exact same error on a regular cluster with multiple nodes.
One solution is to use a single node with a lot of memory such as 128 G above. That means allocating enough resolution into a single node instead of splitting into multiple nodes.
however, I try to avoid pandas as most problems can be solved using spark except for some special utility where there is no other choice.