cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Serving Endpoint: Container image creation

Dnirmania
Contributor

Hi Team

Whenever I try to create an endpoint from a model in Databricks, the process often gets stuck at the 'Container Image Creation' step. I've tried to understand what happens during this step, but couldn't find any detailed or helpful information. Can someone explain the full sequence of steps Databricks performs in the background when serving a model endpoint?

Thanks,

Dinesh

1 ACCEPTED SOLUTION

Accepted Solutions

Vidhi_Khaitan
Databricks Employee
Databricks Employee

hi @Dnirmania 

Below is a detailed, sequenced breakdown of what happens in Databricks when you create a model serving endpoin

1. Model Logging and Registration

  • You first log your trained model using MLflow in a compatible format, such as a built-in MLflow flavor (e.g., sklearn, pytorch, custom pyfunc, etc.).
  • Optionally, additional files such as requirements.txt (pip), conda.yaml, and code dependencies are packaged with the model. This step can also include specifying custom pip/conda environments and code paths.
  • The logged model is registered in Unity Catalog or Workspace Model Registry.

 

2. Endpoint Creation Request

  • You initiate endpoint creation, specifying which registered model version (and compute type: CPU, GPU, etc.) should be served.

 

3. Background Infrastructure Orchestration Begins

Internally, a state machine-driven workflow (the control plane) handles endpoint provisioning, with the major next step being Container Image Creation.

 

 

4. Container Image Creation: Step-by-Step Technical Workflow

a. Gathering Model Artifacts and Environment Metadata

  • The control plane retrieves your modelโ€™s artifacts (model, code, dependency files) and extracts environment specifications (pip/conda requirements, code paths, etc.).

b. Triggering the Container Build

  • The system launches a container build job to package your model and environment into a Docker container.
    • On Databricks AWS deployments, this typically means invoking a builder workflow like AWS CodeBuild. Alternative Kubernetes-based isolation may also be used.

c. Container Build Steps (Inside the Builder Job)

  • i. Start with a Databricks-maintained base image (usually Ubuntu plus minimal Python/ML tooling).
  • ii. Copy in your model artifacts.
  • iii. Install all pip or conda dependencies declared in the model artifacts.
    • For MLflow native flavor models, dependencies are extracted from automatically-captured files (pip_requirements.txt or conda.yaml).
    • For custom models, extra requirements and code dependencies are also installed as needed.
  • iv. Install the Databricks scoring server (MLflow server or Databricks' custom inference server) that will actually serve prediction requests inside the container.
  • v. For GPU endpoints, install specialized CUDA, cuDNN, and GPU library dependencies. (This step often takes longer and is a common slow/stuck point.)
  • vi. Run tests and validation steps, if configured.
  • vii. Bundle everything into a new container image.

d. Upload/Push the Image to the Registry

  • The built container image is uploaded (pushed) to a secure container registry (typically ECR on AWS or Azure Container Registry in Azure).
  • Metadata (container image SHA, status) is recorded in internal databases to track the workflow state.

 

5. Deployment to Serving Infrastructure

  • Once the new image is available, the control plane orchestrates the deployment of containerized pods using Kubernetes (KFServing-compatible) to run the image at scale.
  • Pods are allocated based on your endpoint compute/configuration (CPU/GPU, scaling settings).
  • Health checks are performed (for proper model loading and live prediction support).

 

6. Endpoint Readiness and Autoscaling

  • If health checks pass and at least one pod is available, the endpoint becomes ready and can accept requests.
  • Autoscaling infrastructure is connected (to scale up/down based on load).

 

7. Ongoing Lifecycle and Updates

  • If you update your model or dependencies, the workflow repeats (potentially triggering a new container build, image, and deployment, with zero-downtime switchover logic).
  • Image and pod lifecycles are managed robustly for performance, compliance, and security (with scheduled image obsolescence and deletion rules).

 

What Can Cause The 'Container Image Creation' Step ("stuck"/slow)?

  • Long dependency resolution/installation (especially for large GPU models or conflicting dependencies).
  • Network/misconfiguration issues reaching storage (S3, ACR, etc.) or pulling images.
  • Builds that require extensive downloads (conda-forge, pip, CUDA, etc.), or Docker build slowness.

โ€ข โ€ข For GPU-serving, timeouts if build takes more than 60 minutes (retry is sometimes needed).

View solution in original post

2 REPLIES 2

Vidhi_Khaitan
Databricks Employee
Databricks Employee

hi @Dnirmania 

Below is a detailed, sequenced breakdown of what happens in Databricks when you create a model serving endpoin

1. Model Logging and Registration

  • You first log your trained model using MLflow in a compatible format, such as a built-in MLflow flavor (e.g., sklearn, pytorch, custom pyfunc, etc.).
  • Optionally, additional files such as requirements.txt (pip), conda.yaml, and code dependencies are packaged with the model. This step can also include specifying custom pip/conda environments and code paths.
  • The logged model is registered in Unity Catalog or Workspace Model Registry.

 

2. Endpoint Creation Request

  • You initiate endpoint creation, specifying which registered model version (and compute type: CPU, GPU, etc.) should be served.

 

3. Background Infrastructure Orchestration Begins

Internally, a state machine-driven workflow (the control plane) handles endpoint provisioning, with the major next step being Container Image Creation.

 

 

4. Container Image Creation: Step-by-Step Technical Workflow

a. Gathering Model Artifacts and Environment Metadata

  • The control plane retrieves your modelโ€™s artifacts (model, code, dependency files) and extracts environment specifications (pip/conda requirements, code paths, etc.).

b. Triggering the Container Build

  • The system launches a container build job to package your model and environment into a Docker container.
    • On Databricks AWS deployments, this typically means invoking a builder workflow like AWS CodeBuild. Alternative Kubernetes-based isolation may also be used.

c. Container Build Steps (Inside the Builder Job)

  • i. Start with a Databricks-maintained base image (usually Ubuntu plus minimal Python/ML tooling).
  • ii. Copy in your model artifacts.
  • iii. Install all pip or conda dependencies declared in the model artifacts.
    • For MLflow native flavor models, dependencies are extracted from automatically-captured files (pip_requirements.txt or conda.yaml).
    • For custom models, extra requirements and code dependencies are also installed as needed.
  • iv. Install the Databricks scoring server (MLflow server or Databricks' custom inference server) that will actually serve prediction requests inside the container.
  • v. For GPU endpoints, install specialized CUDA, cuDNN, and GPU library dependencies. (This step often takes longer and is a common slow/stuck point.)
  • vi. Run tests and validation steps, if configured.
  • vii. Bundle everything into a new container image.

d. Upload/Push the Image to the Registry

  • The built container image is uploaded (pushed) to a secure container registry (typically ECR on AWS or Azure Container Registry in Azure).
  • Metadata (container image SHA, status) is recorded in internal databases to track the workflow state.

 

5. Deployment to Serving Infrastructure

  • Once the new image is available, the control plane orchestrates the deployment of containerized pods using Kubernetes (KFServing-compatible) to run the image at scale.
  • Pods are allocated based on your endpoint compute/configuration (CPU/GPU, scaling settings).
  • Health checks are performed (for proper model loading and live prediction support).

 

6. Endpoint Readiness and Autoscaling

  • If health checks pass and at least one pod is available, the endpoint becomes ready and can accept requests.
  • Autoscaling infrastructure is connected (to scale up/down based on load).

 

7. Ongoing Lifecycle and Updates

  • If you update your model or dependencies, the workflow repeats (potentially triggering a new container build, image, and deployment, with zero-downtime switchover logic).
  • Image and pod lifecycles are managed robustly for performance, compliance, and security (with scheduled image obsolescence and deletion rules).

 

What Can Cause The 'Container Image Creation' Step ("stuck"/slow)?

  • Long dependency resolution/installation (especially for large GPU models or conflicting dependencies).
  • Network/misconfiguration issues reaching storage (S3, ACR, etc.) or pulling images.
  • Builds that require extensive downloads (conda-forge, pip, CUDA, etc.), or Docker build slowness.

โ€ข โ€ข For GPU-serving, timeouts if build takes more than 60 minutes (retry is sometimes needed).

Dnirmania
Contributor

Thank you @Vidhi_Khaitan for sharing the detailed process ๐Ÿ™‚..

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now