Using Ganglia you can monitor how busy is the GPU(s). Increasing the batch size would increase that utilization. Bigger batches improve how well each batch updates the model (up to a point) with more accurate gradients. That in turn can allow training to use a higher learning rate, and more quickly reach the point where the model stops improving.
https://databricks.com/blog/2019/08/15/how-not-to-scale-deep-learning-in-6-easy-steps.html