Databricks Community

Dnirmania · ‎07-11-2025

Hi Team

Whenever I try to create an endpoint from a model in Databricks, the process often gets stuck at the 'Container Image Creation' step. I've tried to understand what happens during this step, but couldn't find any detailed or helpful information. Can someone explain the full sequence of steps Databricks performs in the background when serving a model endpoint?

Thanks,

Dinesh

Vidhi_Khaitan · ‎07-22-2025

hi @Dnirmania

Below is a detailed, sequenced breakdown of what happens in Databricks when you create a model serving endpoin

1. Model Logging and Registration

You first log your trained model using MLflow in a compatible format, such as a built-in MLflow flavor (e.g., sklearn, pytorch, custom pyfunc, etc.).
Optionally, additional files such as requirements.txt (pip), conda.yaml, and code dependencies are packaged with the model. This step can also include specifying custom pip/conda environments and code paths.
The logged model is registered in Unity Catalog or Workspace Model Registry.

2. Endpoint Creation Request

You initiate endpoint creation, specifying which registered model version (and compute type: CPU, GPU, etc.) should be served.

3. Background Infrastructure Orchestration Begins

Internally, a state machine-driven workflow (the control plane) handles endpoint provisioning, with the major next step being Container Image Creation.

4. Container Image Creation: Step-by-Step Technical Workflow

a. Gathering Model Artifacts and Environment Metadata

The control plane retrieves your model’s artifacts (model, code, dependency files) and extracts environment specifications (pip/conda requirements, code paths, etc.).

b. Triggering the Container Build

The system launches a container build job to package your model and environment into a Docker container.

On Databricks AWS deployments, this typically means invoking a builder workflow like AWS CodeBuild. Alternative Kubernetes-based isolation may also be used.

c. Container Build Steps (Inside the Builder Job)

i. Start with a Databricks-maintained base image (usually Ubuntu plus minimal Python/ML tooling).
ii. Copy in your model artifacts.
iii. Install all pip or conda dependencies declared in the model artifacts.

For MLflow native flavor models, dependencies are extracted from automatically-captured files (pip_requirements.txt or conda.yaml).
For custom models, extra requirements and code dependencies are also installed as needed.

iv. Install the Databricks scoring server (MLflow server or Databricks' custom inference server) that will actually serve prediction requests inside the container.
v. For GPU endpoints, install specialized CUDA, cuDNN, and GPU library dependencies. (This step often takes longer and is a common slow/stuck point.)
vi. Run tests and validation steps, if configured.
vii. Bundle everything into a new container image.

d. Upload/Push the Image to the Registry

The built container image is uploaded (pushed) to a secure container registry (typically ECR on AWS or Azure Container Registry in Azure).
Metadata (container image SHA, status) is recorded in internal databases to track the workflow state.

5. Deployment to Serving Infrastructure

Once the new image is available, the control plane orchestrates the deployment of containerized pods using Kubernetes (KFServing-compatible) to run the image at scale.
Pods are allocated based on your endpoint compute/configuration (CPU/GPU, scaling settings).
Health checks are performed (for proper model loading and live prediction support).

6. Endpoint Readiness and Autoscaling

If health checks pass and at least one pod is available, the endpoint becomes ready and can accept requests.
Autoscaling infrastructure is connected (to scale up/down based on load).

7. Ongoing Lifecycle and Updates

If you update your model or dependencies, the workflow repeats (potentially triggering a new container build, image, and deployment, with zero-downtime switchover logic).
Image and pod lifecycles are managed robustly for performance, compliance, and security (with scheduled image obsolescence and deletion rules).

What Can Cause The 'Container Image Creation' Step ("stuck"/slow)?

Long dependency resolution/installation (especially for large GPU models or conflicting dependencies).
Network/misconfiguration issues reaching storage (S3, ACR, etc.) or pulling images.
Builds that require extensive downloads (conda-forge, pip, CUDA, etc.), or Docker build slowness.

• • For GPU-serving, timeouts if build takes more than 60 minutes (retry is sometimes needed).

View solution in original post

Vidhi_Khaitan · ‎07-22-2025

hi @Dnirmania

Below is a detailed, sequenced breakdown of what happens in Databricks when you create a model serving endpoin

1. Model Logging and Registration

You first log your trained model using MLflow in a compatible format, such as a built-in MLflow flavor (e.g., sklearn, pytorch, custom pyfunc, etc.).
Optionally, additional files such as requirements.txt (pip), conda.yaml, and code dependencies are packaged with the model. This step can also include specifying custom pip/conda environments and code paths.
The logged model is registered in Unity Catalog or Workspace Model Registry.

2. Endpoint Creation Request

You initiate endpoint creation, specifying which registered model version (and compute type: CPU, GPU, etc.) should be served.

3. Background Infrastructure Orchestration Begins

Internally, a state machine-driven workflow (the control plane) handles endpoint provisioning, with the major next step being Container Image Creation.

4. Container Image Creation: Step-by-Step Technical Workflow

a. Gathering Model Artifacts and Environment Metadata

The control plane retrieves your model’s artifacts (model, code, dependency files) and extracts environment specifications (pip/conda requirements, code paths, etc.).

b. Triggering the Container Build

The system launches a container build job to package your model and environment into a Docker container.

On Databricks AWS deployments, this typically means invoking a builder workflow like AWS CodeBuild. Alternative Kubernetes-based isolation may also be used.

c. Container Build Steps (Inside the Builder Job)

i. Start with a Databricks-maintained base image (usually Ubuntu plus minimal Python/ML tooling).
ii. Copy in your model artifacts.
iii. Install all pip or conda dependencies declared in the model artifacts.

For MLflow native flavor models, dependencies are extracted from automatically-captured files (pip_requirements.txt or conda.yaml).
For custom models, extra requirements and code dependencies are also installed as needed.

iv. Install the Databricks scoring server (MLflow server or Databricks' custom inference server) that will actually serve prediction requests inside the container.
v. For GPU endpoints, install specialized CUDA, cuDNN, and GPU library dependencies. (This step often takes longer and is a common slow/stuck point.)
vi. Run tests and validation steps, if configured.
vii. Bundle everything into a new container image.

d. Upload/Push the Image to the Registry

The built container image is uploaded (pushed) to a secure container registry (typically ECR on AWS or Azure Container Registry in Azure).
Metadata (container image SHA, status) is recorded in internal databases to track the workflow state.

5. Deployment to Serving Infrastructure

Once the new image is available, the control plane orchestrates the deployment of containerized pods using Kubernetes (KFServing-compatible) to run the image at scale.
Pods are allocated based on your endpoint compute/configuration (CPU/GPU, scaling settings).
Health checks are performed (for proper model loading and live prediction support).

6. Endpoint Readiness and Autoscaling

If health checks pass and at least one pod is available, the endpoint becomes ready and can accept requests.
Autoscaling infrastructure is connected (to scale up/down based on load).

7. Ongoing Lifecycle and Updates

If you update your model or dependencies, the workflow repeats (potentially triggering a new container build, image, and deployment, with zero-downtime switchover logic).
Image and pod lifecycles are managed robustly for performance, compliance, and security (with scheduled image obsolescence and deletion rules).

What Can Cause The 'Container Image Creation' Step ("stuck"/slow)?

Long dependency resolution/installation (especially for large GPU models or conflicting dependencies).
Network/misconfiguration issues reaching storage (S3, ACR, etc.) or pulling images.
Builds that require extensive downloads (conda-forge, pip, CUDA, etc.), or Docker build slowness.

• • For GPU-serving, timeouts if build takes more than 60 minutes (retry is sometimes needed).

Dnirmania · ‎07-28-2025

Thank you @Vidhi_Khaitan for sharing the detailed process 🙂..

Databricks Community

Serving Endpoint: Container image creation

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples