cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

preloaded_docker_images: how do they work?

mrstevegross
New Contributor III

At my org, when we start a databricks cluster, it oftens takes awhile to become available (due to (1) instance provisioning, (2) library loading, and (3) init script execution). I'm exploring whether an instance pool could be a viable strategy for improving cluster execution time.

I see there's a feature called "preloaded_docker_images" (https://docs.databricks.com/api/workspace/instancepools/get#preloaded_docker_images), but the docs are limited. Is there canonincal documentation the explains:

  1. When are the docker images loaded in the lifecycle of the instance?
  2. Given that you can supply N images, how does container isolation work? I only need to load one container, but it's unclear to me if my Spark job will run "inside" that container? (Given that I'm using the container to preload jars, I'm pretty sure the Spark job needs to be able to see those jars!)
6 REPLIES 6

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @mrstevegross,

About your cluster startup time, how long does it take to come up?

    • 1 - Docker images specified in the preloaded_docker_images field are loaded when the instance pool is created or when instances are added to the pool. This means that the images are pulled and cached on the instances before they are used for running jobs.

When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container. This means that all Spark jobs executed on the cluster will run inside the specified Docker container.

    • 2. Since your Spark job runs inside the Docker container, it will have access to any jars or libraries that are preloaded within that container. This ensures that your Spark job can see and use the preloaded jars as expected.

Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html

mrstevegross
New Contributor III

>When you specify a Docker image for your Databricks cluster, the entire cluster runs within that Docker container. 

Just to clarify: are you saying that the Databricks job request itself says which container to use?

>Please be aware of some limitations: https://docs.databricks.com/en/compute/custom-containers.html

Roger that, reading docs now.

Thanks!

mrstevegross
New Contributor III

>Just to clarify: are you saying that the Databricks job request itself says which container to use?

I see here (https://docs.databricks.com/api/workspace/clusters/create#docker_image) that the create-cluster request can include an image-to-load. How does that interact with the instance pool's "preloaded_docker_images" feature?

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @mrstevegross, not exactly it should come with the API request.

When you create a cluster using an instance pool with preloaded Docker images, the cluster can use one of the preloaded images if it matches the docker_image specified in the create-cluster request. If the specified docker_image is not preloaded in the instance pool, the cluster will load the specified image, which may take additional time.

mrstevegross
New Contributor III

> if it matches the docker_image specified in the create-cluster request.

Aha, good to know. Can y'all update the reference docs to clarify these semantics?

Alberto_Umana
Databricks Employee
Databricks Employee

Sure, I will inform the team in charge of it to review it.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group