cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I utilize multiple GPUs from multiple nodes in Databricks

Mathew
New Contributor

I am currently experimenting with the whisper model for batchwise inference on Databricks and have successfully utilized multiple instances of the model by accessing multiple GPUs available in the driver node. However, I am wondering how I can leverage the multiple GPUs present in each of the worker nodes, as I am unable to access them. I have come across documentation on utilizing all worker nodes with pyspark-based libraries, but I am specifically interested in how to achieve this with a transformer model like whisper. Any insights or suggestions would be greatly appreciated.

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Mathew , Leveraging multiple GPUs for batchwise inference with the Whisper model on Databricks can significantly enhance performance. While the Whisper model typically uses a single GPU, thereโ€™s a workaround to utilize multiple GPUsโ€”one for the encoder and another for the decoder. Hereโ€™s how you can achieve this:

  1. Update the Whisper Package: First, ensure that you have the latest commit of the Whisper package. You can update it using the following command:

    pip install --upgrade --no-deps --force-reinstall git+[5](https://github.com/openai/whisper.git)
    
  2. Load the Model and Distribute GPUs: In your Python code, load the Whisper model (e.g., โ€œlargeโ€) and distribute the GPUs as follows:

    import whisper
    
    # Load the model (initially on CPU)
    model = whisper.load_model("large", device="cpu")
    
    # Move the encoder to the first GPU (cuda:0)
    model.encoder.to("cuda:0")
    
    # Move the decoder to the second GPU (cuda:1)
    model.decoder.to("cuda:1")
    
    # Register hooks to manage data flow between GPUs
    model.decoder.register_forward_pre_hook(
        lambda _, inputs: tuple([inputs[0].to("cuda:1"), inputs[1].to("cuda:1")] + list(inputs[2:]))
    )
    model.decoder.register_forward_hook(
        lambda _, inputs, outputs: outputs.to("cuda:0")
    )
    
    # Perform inference (e.g., transcribe an audio file)
    model.transcribe("jfk.flac")
    

    The code above uses register_forward_pre_hook to move the decoderโ€™s input to the second GPU (โ€œcuda:1โ€) and register_forward_hook to put the results back to the first GPU (โ€œcuda:0โ€). The latter is not strictly necessary but serves as a workaround because the decoding logic assumes the outputs are on the same device as the encoder.

  3. VRAM Usage: After executing the snippet above, check the VRAM usage on your 2-GPU machine. It should distribute the load effectively between the GPUs.

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!