cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I utilize multiple GPUs from multiple nodes in Databricks

Mathew
New Contributor

I am currently experimenting with the whisper model for batchwise inference on Databricks and have successfully utilized multiple instances of the model by accessing multiple GPUs available in the driver node. However, I am wondering how I can leverage the multiple GPUs present in each of the worker nodes, as I am unable to access them. I have come across documentation on utilizing all worker nodes with pyspark-based libraries, but I am specifically interested in how to achieve this with a transformer model like whisper. Any insights or suggestions would be greatly appreciated.

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Mathew , Leveraging multiple GPUs for batchwise inference with the Whisper model on Databricks can significantly enhance performance. While the Whisper model typically uses a single GPU, there’s a workaround to utilize multiple GPUs—one for the encoder and another for the decoder. Here’s how you can achieve this:

  1. Update the Whisper Package: First, ensure that you have the latest commit of the Whisper package. You can update it using the following command:

    pip install --upgrade --no-deps --force-reinstall git+[5](https://github.com/openai/whisper.git)
    
  2. Load the Model and Distribute GPUs: In your Python code, load the Whisper model (e.g., “large”) and distribute the GPUs as follows:

    import whisper
    
    # Load the model (initially on CPU)
    model = whisper.load_model("large", device="cpu")
    
    # Move the encoder to the first GPU (cuda:0)
    model.encoder.to("cuda:0")
    
    # Move the decoder to the second GPU (cuda:1)
    model.decoder.to("cuda:1")
    
    # Register hooks to manage data flow between GPUs
    model.decoder.register_forward_pre_hook(
        lambda _, inputs: tuple([inputs[0].to("cuda:1"), inputs[1].to("cuda:1")] + list(inputs[2:]))
    )
    model.decoder.register_forward_hook(
        lambda _, inputs, outputs: outputs.to("cuda:0")
    )
    
    # Perform inference (e.g., transcribe an audio file)
    model.transcribe("jfk.flac")
    

    The code above uses register_forward_pre_hook to move the decoder’s input to the second GPU (“cuda:1”) and register_forward_hook to put the results back to the first GPU (“cuda:0”). The latter is not strictly necessary but serves as a workaround because the decoding logic assumes the outputs are on the same device as the encoder.

  3. VRAM Usage: After executing the snippet above, check the VRAM usage on your 2-GPU machine. It should distribute the load effectively between the GPUs.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group