Hi All,
I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because Iโm running out of RAM. This is due to the large dataset that I need to load into memory.
The size of the dataset is unavoidable, but the computations are straightforward.
How should I go about selecting the appropriate cluster? Is there a useful guide for choosing the right cluster that I could follow going forward?
Thanks,
Aidzillafont