I am currently experimenting with the whisper model for batchwise inference on Databricks and have successfully utilized multiple instances of the model by accessing multiple GPUs available in the driver node. However, I am wondering how I can leverage the multiple GPUs present in each of the worker nodes, as I am unable to access them. I have come across documentation on utilizing all worker nodes with pyspark-based libraries, but I am specifically interested in how to achieve this with a transformer model like whisper. Any insights or suggestions would be greatly appreciated.