01-13-2025 08:44 AM
At my org, when we start a databricks cluster, it oftens takes awhile to become available (due to (1) instance provisioning, (2) library loading, and (3) init script execution). I'm exploring whether an instance pool could be a viable strategy for improving cluster execution time.
I see there's a feature called "preloaded_docker_images" (https://docs.databricks.com/api/workspace/instancepools/get#preloaded_docker_images), but the docs are limited. Is there canonincal documentation the explains:
01-13-2025 08:59 AM
Hi @mrstevegross,
About your cluster startup time, how long does it take to come up?
When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container. This means that all Spark jobs executed on the cluster will run inside the specified Docker container.
Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html
01-13-2025 09:02 AM
>When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container.
Just to clarify: are you saying that the Databricks job request itself says which container to use?
>Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html
Roger that, reading docs now.
Thanks!
01-13-2025 09:25 AM
>Just to clarify: are you saying that the Databricks job request itself says which container to use?
I see here (https://docs.databricks.com/api/workspace/clusters/create#docker_image) that the create-cluster request can include an image-to-load. How does that interact with the instance pool's "preloaded_docker_images" feature?
01-13-2025 11:39 AM
Hi @mrstevegross, not exactly it should come with the API request.
When you create a cluster using an instance pool with preloaded Docker images, the cluster can use one of the preloaded images if it matches the docker_image specified in the create-cluster request. If the specified docker_image is not preloaded in the instance pool, the cluster will load the specified image, which may take additional time.
01-13-2025 11:48 AM
> if it matches the docker_image specified in the create-cluster request.
Aha, good to know. Can y'all update the reference docs to clarify these semantics?
01-13-2025 12:48 PM
Sure, I will inform the team in charge of it to review it.
2 weeks ago - last edited 2 weeks ago
Hello, when we specify docker image with credentials in instance pool configuration, should we also specify credentials in cluster configuration?. as we already have image pulled into the pool instance.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now