lizou
Contributor III

if a python process does not use spark, such as pandas (not spark pandas), only one node is used. I ran exact same error on a regular cluster with multiple nodes.

One solution is to use a single node with a lot of memory such as 128 G above. That means allocating enough resolution into a single node instead of splitting into multiple nodes.

however, I try to avoid pandas as most problems can be solved using spark except for some special utility where there is no other choice.