Databricks Community

alisher_pwc · ‎03-03-2023

Hello Databricks community!

We are facing a strong need of serving some of public and our private models on GPU clusters and we have several requirements:

1) We'd like to be able to start/stop the endpoints (best with scheduling) to avoid excess consumption

2) We'd like to have a static address of the endpoint

3) (optional) We'd like to be able to run several models on one cluster (to use GPU more efficiently)

As far as we know you have GPU clusters and Container Services. The question: is it possible to run a docker container (or group) and expose it?

We know that most of the GPU services are either in preview or in beta, however, we would like to hear any advice from you. Right now we are using Databricks on Azure for different purposes than ML but would love to start using your platform to host our ML models.

Please suggest us possible approaches from your experience.

Thank you 🙂

Debayan · ‎03-07-2023

Hi,

You can use Databricks Container Services on clusters with GPUs to create portable deep learning environments with customized libraries. See Customize containers with Databricks Container Services for instructions.

To create custom images for GPU clusters, you must select a standard runtime version instead of Databricks Runtime ML for GPU. When you select Use your own Docker container, you can choose GPU clusters with a standard runtime version. The custom images for GPU clusters are based on the official CUDA containers, which is different from Databricks Runtime ML for GPU.

When you create custom images for GPU clusters, you cannot change the NVIDIA driver version, because it must match the driver version on the host machine.

Docker Hub contains example base images with GPU capability. The Dockerfiles used to generate these images are located in the example containers GitHub repository, which also has details on what the example images provide and how to customize them.

Please refer to : https://docs.databricks.com/clusters/gpu.html#databricks-container-services-on-gpu-clusters

Please let us know if this helps.

Also please tag @Debayan with your next response which will notify me, Thank you!

Vartika · ‎03-31-2023

Hi @Alisher Akh

Does @Debayan Mukherjee's answer help? If yes, would you be happy to mark the answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you further.

Cheers!