4 weeks ago
At my org, when we start a databricks cluster, it oftens takes awhile to become available (due to (1) instance provisioning, (2) library loading, and (3) init script execution). I'm exploring whether an instance pool could be a viable strategy for improving cluster execution time.
I see there's a feature called "preloaded_docker_images" (https://docs.databricks.com/api/workspace/instancepools/get#preloaded_docker_images), but the docs are limited. Is there canonincal documentation the explains:
4 weeks ago
Hi @mrstevegross,
About your cluster startup time, how long does it take to come up?
When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container. This means that all Spark jobs executed on the cluster will run inside the specified Docker container.
Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html
4 weeks ago
>When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container.
Just to clarify: are you saying that the Databricks job request itself says which container to use?
>Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html
Roger that, reading docs now.
Thanks!
4 weeks ago
>Just to clarify: are you saying that the Databricks job request itself says which container to use?
I see here (https://docs.databricks.com/api/workspace/clusters/create#docker_image) that the create-cluster request can include an image-to-load. How does that interact with the instance pool's "preloaded_docker_images" feature?
4 weeks ago
Hi @mrstevegross, not exactly it should come with the API request.
When you create a cluster using an instance pool with preloaded Docker images, the cluster can use one of the preloaded images if it matches the docker_image specified in the create-cluster request. If the specified docker_image is not preloaded in the instance pool, the cluster will load the specified image, which may take additional time.
4 weeks ago
> if it matches the docker_image specified in the create-cluster request.
Aha, good to know. Can y'all update the reference docs to clarify these semantics?
4 weeks ago
Sure, I will inform the team in charge of it to review it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group