Hello there!
I am writing this open message to know how you guys are using the computes in your work cases.
Currently, in my company, we have multiple compute instances that can be differentiated into two main types:
- Clusters with a large instance for batch processing. This type of cluster is typically used to process a large number of files using parallelization with large instances (e.g., m5d.10xlarge).
- Clusters with a small instance for each data scientist in the team. This type of cluster is typically used for analysis, so there is no need for a large instance (e.g., m5d.xlarge).
For the first type, we have a clear understanding. One large compute instance for batch processing is sufficient for our current needs.
For the second type, we are unsure which option is better:
- To have small compute instances for each data scientist in the team (e.g., m5d.xlarge instance).
- To have a larger compute instance that is used by all data scientists simultaneously (e.g., m5d.10xlarge instance).
From the second option, we are unsure how to allocate the compute instance among the number of data scientists in the team so that each one receives the same number of cores and RAM, ensuring that all have the same computational power. This is why we have decided to have one small compute instance for each data scientist in the team, but we are not certain which are the best practices in this situation.
Therefore, we would appreciate your recommendation on which of these two options is most suitable. Perhaps there are alternative solutions that we have overlooked.
Thank you in advance for your time team!