Youยดll need to play with those two options, but, since initial iterations of training a machine learning model are often experimental, a smaller cluster is a good choice. A smaller cluster will also reduce the impact of shuffles. Recommended worker types are storage optimized with Delta Caching enabled to account for repeated reads of the same data and to enable caching of training data. If the compute and storage options provided by storage-optimized nodes are not sufficient, consider GPU-optimized nodes. A possible downside is the lack of Delta Caching support with these nodes.
Also, if you are talking about training deep learning models, check best practices, and if you are using PyTorch, the new. TorchDistributor.
https://learn.microsoft.com/en-us/azure/databricks/machine-learning/train-model/dl-best-practices
PS; Check #DAIS2023 talks, the creator of PyTorch is giving a keynote