How to pick the right cluster for your workflow

Aidzillafont
New Contributor II

Hi All,

I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset that I need to load into memory.

The size of the dataset is unavoidable, but the computations are straightforward.

How should I go about selecting the appropriate cluster? Is there a useful guide for choosing the right cluster that I could follow going forward?

Thanks,

Aidzillafont