Databricks Community

mmt · ‎03-23-2026

Introduction

Ultralytics YOLO^* (You Only Look Once) is one of the most widely used computer vision frameworks. It is fast, accurate, and well supported, with a range of model sizes (from nano to extra-large) so you can trade off speed and accuracy for edge or server deployment. Training and inference are straightforward with a Python API and practical documentation, and the ecosystem features readily available pretrained weights, support for standard datasets (e.g. COCO), and ongoing active model development, as exemplified by recent advancements in YOLO11.

For teams adopting computer vision (CV) tasks on Databricks, Ultralytics YOLO is a practical choice for both prototyping and production pipelines. The framework supports multiple CV tasks — object detection, classification, segmentation, pose estimation, and oriented bounding boxes (OBB) — each with models in several sizes (nano to extra-large, often denoted as n, s, m, l, x).

Figure 1: Common computer vision tasks and their associated annotation type.

This post demonstrates a single-node workflow for training an object detection model on Databricks AI Runtime — scalable, serverless NVIDIA GPU compute. We use the nano YOLO model, YOLO11n, for real-time performance that outputs bounding boxes, class labels, and confidence scores. The process covers training YOLO11n on the COCO128 dataset (demo-only; refer to Data preparation for production guidance) and deploying it to Model Serving. Deployment includes a custom MLflow PyFunc wrapper to handle base64 image input to the YOLO model and structured bounding-box output from model prediction.

Why AI Runtime and single-node?

Critically, running YOLO on Databricks AI Runtime lets you train and iterate without provisioning or managing clusters: you get GPU compute on demand, pay for what you use, and when you are done the compute is terminated. This makes it ideal for experimentation, proof-of-concept, and small-to-medium training jobs — and MLflow and Unity Catalog keep experiments and artifacts organized.

Single-node (one GPU instance) keeps the workflow simple and sufficient for many object-detection use cases. YOLO11n is a small model; training on datasets in the low thousands to tens of thousands of images often fits comfortably on one GPU (e.g. A10). A single node avoids distributed-training setup, multi-worker debugging, and extra cost — so you can focus on data, labels, and the MLflow-to-serving path.

When your dataset or model grows and training time becomes a bottleneck, you can move to multi-GPU or multi-node patterns; the same registration and deployment steps in this post still apply.

What you'll need to get started

Workspace: AI Runtime with access to an AI base environment (Serverless GPU).
Permissions: Create schemas and volumes in Unity Catalog; run MLflow experiments; register models; create Model Serving endpoints.
Notebook: From the cv-playground repository — e.g. air-yolo11n-detect-coco128-singleGPU.ipynb.

Add the notebook via Import a notebook or Clone a Git repo (Repos), then attach to Serverless GPU.

Navigate to the Compute dropdown, Connect to and configure the notebook AI Runtime. Within the Environment Panel found on the right-hand edge of the notebook, select an Accelerator and the AI Base environment (this walkthrough uses A10 on AI v5, as shown below). Finally, click Apply and then Confirm, as shown in the figure below.

Figure 2: Connecting to AI Runtime Serverless GPU cluster and configuring the Notebook Environment: Connect → Serverless GPU → Environment → choose your Accelerator + the AI base environment → Apply & Confirm.

Given that packages and dependencies are installed in the first notebook cell, there is no need to install within the cluster environment panel.

Workflow overview

The notebook walks through six steps in order; each builds on the previous one.

Figure 3: An end-to-end workflow.

Step 1: Setup — Environment and Unity Catalog

After attaching to Serverless GPU (see above: Connect → Serverless GPU → choose your Accelerator + AI base environment → Apply and Confirm), the first steps are to install the required Python packages and configure your Unity Catalog project structure.

On AI v5, the base environment already bundles mlflow>=3 (skinny build), nvidia-ml-py, threadpoolctl, and torch — so the only package you install is Ultralytics. %pip restarts Python automatically after the install. Then set a writable YOLO config directory to avoid permission issues. A Package Verification cell (in the notebook) confirms the expected packages are present and flags the uncommon case where Model Serving needs the full mlflow re-added on top of the skinny build.

# ============================================================
# PACKAGE INSTALLATION — AI v5 (Serverless GPU, single GPU)
# ============================================================
# AI v5 pre-bundles mlflow>=3 (skinny), nvidia-ml-py, threadpoolctl, torch.
# Only ultralytics needs installing.
%pip install ultralytics==8.3.204 -q

# Note: %pip automatically restarts the Python environment after install.

# Set a writable YOLO config dir (avoids permission errors)
import os, uuid
config_dir = f'/tmp/yolo_config_{uuid.uuid4().hex[:8]}'
os.environ['YOLO_CONFIG_DIR'] = config_dir
os.makedirs(config_dir, exist_ok=True)

Next, create or use a catalog, schema, and Unity Catalog Volume for data, raw models, and model checkpoints from training runs. Use widgets for catalog, schema, volume, and model name so the same notebook can be reused across workspaces.

# Widgets for catalog, schema, volume, model name
catalog_name = dbutils.widgets.get("catalog_name")   # e.g. "main"
schema_name  = dbutils.widgets.get("schema_name")    # e.g. "default"
volume_name  = dbutils.widgets.get("volume_name")    # e.g. "yolo_sgc"

spark.sql(f"CREATE SCHEMA IF NOT EXISTS `{catalog_name}`.`{schema_name}`")
spark.sql(f"CREATE VOLUME IF NOT EXISTS `{catalog_name}`.`{schema_name}`.`{volume_name}`")

project_location = f'/Volumes/{catalog_name}/{schema_name}/{volume_name}/'
os.makedirs(f'{project_location}runs/', exist_ok=True)
os.makedirs(f'{project_location}data/', exist_ok=True)
os.makedirs(f'{project_location}raw_model/', exist_ok=True)

Step 2: Data preparation

The dataset is configured via a YAML file (path, splits, class names). We download the COCO128 config and data to the volume, then split it into train (62.5%), validation (18.75%), and test (18.75%) with a fixed seed, updating the YAML with the new paths. For custom data, you typically adjust the YAML for your paths and classes. We use the Ultralytics coco128.yaml, downloaded to the UC Volume, but you can substitute your own config (e.g., data.yaml).

# Download COCO128 dataset configuration to UC Volume
import yaml

os.makedirs(f'{project_location}data/coco128', exist_ok=True)
config_url  = "https://github.com/ultralytics/ultralytics/raw/main/ultralytics/cfg/datasets/coco128.yaml"
config_path = f"{project_location}data/coco128.yaml"

download_file(config_url, config_path, "COCO128 config")

# Then load config, set data['path'] to volume path, download/extract dataset if needed, save updated YAML

Split the data and update the YAML with train/val/test image paths:
train_size, val_size, test_size = split_dataset(
    source_images_dir=f"{project_location}data/coco128/images/train2017",
    source_labels_dir=f"{project_location}data/coco128/labels/train2017",
    base_images_dir=f"{project_location}data/coco128/images",
    base_labels_dir=f"{project_location}data/coco128/labels",
    train_ratio=0.625,   # 62.5%
    val_ratio=0.1875,    # 18.75%
    random_seed=42,
)

# In the notebook: update data.yaml so 'train', 'val', 'test' point to the new split dirs
# (e.g. .../images/train, .../images/val, .../images/test)

Important note: COCO128 is used here only for demonstration. With ~128 images it is too small for production and will overfit. For real use cases, use larger datasets (e.g. 100K+ images or 1K+ domain-specific images). The same workflow applies — update data paths and config as needed.

Step 3: MLflow — Custom PyFunc Wrapper and Configuration

To deploy the trained YOLO model to Model Serving, we need a single, serializable API: the endpoint will receive requests (e.g., base64-encoded images) and return structured responses (e.g., bounding boxes). YOLO's native API expects file paths or NumPy arrays and returns a rich in-memory object, which is not what the serving layer expects.

The notebook therefore defines an MLflow Custom PyFunc wrapper YOLOWrapper(mlflow.pyfunc.PythonModel) that accepts a DataFrame with an image_base64 column and returns a DataFrame of detections (class, confidence, bbox columns).

The wrapper class has three methods:

load_context loads the .pt artifact into self.model when the model is loaded (e.g. at serving startup).
predict accepts a DataFrame with image_base64, decodes each image, runs YOLO, and returns a DataFrame via _format_predictions.
_format_predictions converts YOLO's Results (.boxes, .names) into a single DataFrame with class name, class id, confidence, and bbox columns (xyxy and xywh).

We define the wrapper now so it's ready to use immediately after training completes in Step 4.

class YOLOWrapper(mlflow.pyfunc.PythonModel):
    """Custom MLflow wrapper for YOLO models using base64-encoded images."""

    def load_context(self, context):
        """Load YOLO model from artifacts (called once when model is loaded)."""
        from ultralytics import YOLO
        model_path = context.artifacts["yolo_model"]
        self.model = YOLO(model_path, task='detect')

    def _format_predictions(self, predictions):
        """Convert YOLO Results to a single DataFrame with class, confidence, bbox columns."""
        import pandas as pd
        all_results = []
        for prediction in predictions:
            if prediction.boxes is not None:
                boxes = prediction.boxes
                for i in range(len(boxes)):
                    box_xyxy = boxes.xyxy[i].cpu().numpy()
                    box_xywh = boxes.xywh[i].cpu().numpy()
                    all_results.append({
                        "class_name": prediction.names[int(boxes.cls[i])],
                        "class_num": int(boxes.cls[i]),
                        "confidence": float(boxes.conf[i]),
                        "bbox_x1": float(box_xyxy[0]), "bbox_y1": float(box_xyxy[1]),
                        "bbox_x2": float(box_xyxy[2]), "bbox_y2": float(box_xyxy[3]),
                        "bbox_center_x": float(box_xywh[0]), "bbox_center_y": float(box_xywh[1]),
                        "bbox_width": float(box_xywh[2]), "bbox_height": float(box_xywh[3]),
                    })
        return pd.DataFrame(all_results)

    def predict(self, context, model_input):
        """Accept DataFrame with image_base64; decode, run YOLO, return DataFrame of detections."""
        import pandas as pd
        import base64
        from PIL import Image
        import io
        import numpy as np

        if 'image_base64' not in model_input.columns:
            raise ValueError("DataFrame must contain 'image_base64' column")
        all_predictions = []

        for image_base64 in model_input['image_base64'].tolist():
            image_bytes = base64.b64decode(image_base64)
            image_array = np.array(Image.open(io.BytesIO(image_bytes)))
            predictions = self.model.predict(image_array, verbose=False)
            all_predictions.extend(predictions)

        return self._format_predictions(all_predictions)

Step 3.1: MLflow configuration

We infer the model signature from a sample prediction using this custom wrapper (DataFrame with image_base64 input and detection columns output). We also set the MLflow experiment (e.g., under /Workspace/Shared/) and enable system metrics logging, along with YOLO's MLflow integration and MLflow autologging.

# Infer signature from a sample image (input: base64, output: bbox columns)
signature, input_example = infer_model_signature(model_path, sample_images[0])

# Enable system metrics and set experiment
experiment_name, experiment_id = setup_mlflow_experiment(
    use_workspaceUsers_path=False,
    expt_name_suffix="Experiments_YOLO_CoCo",
)

The model is then registered to Unity Catalog (mlflow.pyfunc.log_model) using this wrapper and the best checkpoint.pt artifact (called after training in Step 4:

mlflow.pyfunc.log_model(
    name="model",
    python_model=YOLOWrapper(),
    artifacts={"yolo_model": model_path},
    signature=signature,
    input_example=input_example,
    registered_model_name=registered_model_name,
    pip_requirements=["ultralytics==...", "cloudpickle==...", "torch", "torchvision", "pillow", "numpy"],
)

Step 4: Model training

Train YOLO11n with your chosen hyperparameters (epochs, batch size, learning rate, patience, dropout, weight decay); these are specified in the config variables within model.train() as shown in the code snippet. Training runs in a unique temp directory, and the results and validation metrics are copied into the volume under a named run folder ({task}_{model}_{dataset}_{timestamp}_run_{run_id}). The best checkpoint is saved and is then registered to Unity Catalog with the custom PyFunc wrapper (base64 in, structured detections out) defined in the previous step.

model = YOLO(model_path)
results = model.train(
    task="detect",
    batch=4,
    device=0,                    # Single GPU for Serverless AI Runtime
    data=data_yaml_path,
    epochs=100,
    lr0=0.001,
    project=project_location,
    name=f"run_{timestamp}",
    patience=5,                  # Adjust as needed
    dropout=0.2,
    weight_decay=0.0005,
    save=True,
)
run_id = mlflow.last_active_run().info.run_id

# Register to Unity Catalog with custom PyFunc wrapper (base64 in, bbox out)
registered_model_name = register_yolo_model(
    run_id=run_id,
    model_path=best_model_path,
    catalog_name=catalog_name,
    schema_name=schema_name,
    model_name=model_name,
    signature=signature,
    input_example=input_example,
    data_yaml_path=data_yaml_path,
)

Step 5: Model evaluation

Evaluate the registered model on validation and test sets (sample predictions and metrics), then run a local serving test by loading the model via mlflow.pyfunc.load_model() and calling it with base64-encoded images to confirm the same interface the endpoint will use.

# Local serving test: same I/O as the deployed endpoint
model_uri = f"models:/{registered_model_name}/{latest_version}"
serving_model = mlflow.pyfunc.load_model(model_uri)

with open(test_image_path, 'rb') as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')
input_df = pd.DataFrame({"image_base64": [image_base64]})
predictions = serving_model.predict(input_df)  # DataFrame with class_name, confidence, bbox_*

Step 6: Model deployment

After a manual checkpoint (e.g. a "Proceed with Deployment" widget), you can create or update a Custom Model Serving endpoint. The deployment configuration includes:

WorkspaceClient SDK — enables programmatic endpoint management, ensuring deployments are repeatable, version-controlled, and integrated into automated workflows.
Small [endpoint workload compute size] and enabling scale-to-zero — minimizes compute costs during development and evaluation by provisioning resources on demand and releasing them when the endpoint is idle.
Unity AI Gateway inference tables — automatically logs all request and response payloads to a Unity Catalog Delta table, providing a built-in audit trail for monitoring, debugging, and downstream evaluation without additional instrumentation.

Step 6.1: Create the endpoint with Unity AI Gateway enabled

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import (
    ServedEntityInput, EndpointCoreConfigInput,
    AiGatewayConfig, AiGatewayInferenceTableConfig,
)

w = WorkspaceClient()
w.serving_endpoints.create(
    name=endpoint_name,
    config=EndpointCoreConfigInput(
        served_entities=[
            ServedEntityInput(
                entity_name=registered_model_name,
                entity_version=str(model_version),
                workload_size="Small",
                scale_to_zero_enabled=True,
            )
        ]
    ),
    ai_gateway=AiGatewayConfig(
        inference_table_config=AiGatewayInferenceTableConfig(
            catalog_name=catalog_name,
            schema_name=schema_name,
            table_name_prefix=endpoint_name,
            enabled=True,
        )
    ),
)

Test the deployed endpoint by calling it with a base64-encoded image and verify the structured bounding-box response.

Step 6.2: Call the endpoint with base64 input (same as local PyFunc test)

import base64

with open(test_image_path, 'rb') as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')

response = w.serving_endpoints.query(
    name=endpoint_name,
    dataframe_records=[{"image_base64": image_base64}],
)

# Response contains DataFrame with class_name, confidence, bbox_x1, bbox_y1, ...

Successful testing confirms the custom PyFunc wrapper's ability to handle base64-encoded image input and return a structured bounding-box output from the Model Serving endpoint, which are critical technical considerations detailed next.

Key technical details

A few details are worth calling out for implementation and operations:

Base64 input: The custom MLflow PyFunc wrapper accepts a DataFrame with an image_base64 column, where images are encoded as base64 strings. In our example, JPEG images are used; other formats (e.g. PNG, BMP) may work but have not been validated here. Base64 encoding keeps the API simple and works across network boundaries for Model Serving.
Bounding box output: The wrapper returns a DataFrame with columns such as class_name, class_num, confidence, bbox_x1, bbox_y1, bbox_x2, bbox_y2, and center/width/height. This structure is inferred once and used for the registered model's signature.
Why we use a custom wrapper: YOLO's native API expects image paths or NumPy arrays and returns a rich in-memory object; Model Serving expects a single, serializable contract (DataFrame input and output) for HTTP requests and responses. The wrapper (1) accepts base64-encoded images in a DataFrame column (JSON-friendly), (2) loads the .pt artifact and runs the YOLO object detection task inside predict, and (3) returns a structured DataFrame of detections that the endpoint can serialize. Without it, the raw YOLO model could not be deployed as a standard PyFunc. The full implementation is in Step 3 above.
Unity Catalog Volume structured layout: Data is stored in /Volumes/{catalog}/{schema}/{volume}/data/. Pretrained weights are located in raw_model/. Each training run has its own dedicated folder under runs/{task}_{model}_{dataset}_{timestamp}_run_{run_id}/, which includes subfolders for train/, validation_metrics/, validation_samples/, and test_samples/.
Deployment safety: A parameter widget (e.g. "Proceed with Deployment") gates the deployment cells so "Run All" doesn't deploy by accident. Endpoint creation/update can take on the order of 10–20 minutes; the notebook can exit early and direct you to re-run or check the UI.
Unity AI Gateway: New endpoints are created with Unity AI Gateway inference table config (catalog, schema, table name prefix). The payload table is created and populated after the first requests; there can be a short delay before rows appear. You can query the table (e.g. SELECT * FROM catalog.schema.endpoint_payload ORDER BY timestamp_ms DESC) to inspect logged requests.

Conclusion

The ability to train or fine-tune and deploy YOLO (You Only Look Once) models on Databricks Data Intelligence Platform provides enterprises with a high-performance, cost-optimized, and easily adoptable Computer Vision (CV) solution. Our walkthrough shows a complete path from raw images to a live YOLO endpoint on Databricks — no cluster provisioning, full MLflow tracking, Unity Catalog governance, and production-ready serving with built-in request logging. Swap COCO128 for your own dataset and the same workflow applies. As your data or model complexity grows, the same registration and deployment steps extend to multi-GPU and multi-node training patterns.

Figure 4: Example validation of YOLO object detection inference on a sample of COCO128 images.

Single-GPU accelerator options: This walkthrough uses a single A10 GPU, a cost-efficient choice for YOLO11n on small-to-medium datasets. The same notebook code also runs on a single H100 (Beta) for faster training — just select H100 as the Accelerator in the same AI v5 Environment panel; the rest of the workflow is unchanged. Check the AI Runtime release notes for current accelerator availability and Beta status in your region.

Next steps

Here's how you can try this out:

Single-node YOLO example: Clone the cv-playground repository and run the notebook in your Databricks workspace. Attach it to an AI Runtime (Serverless GPU) with an AI base environment. After running the cells, swap the COCO128 dataset for your own data and paths.
Larger-scale instance segmentation: For a bigger example using YOLO, check out the NuInsSeg project within the same repository.
Monitor and improve: Use the Unity AI Gateway inference table to monitor model traffic, debug inputs and outputs, and feed insights back into your analytics or model improvement pipeline.

Stay tuned for the follow-up post on multi-GPU and multi-node YOLO model training on AI Runtime!

Acknowledgements: Thanks to Lin Yuan (Engineering, AI Runtime) for technical review and feedback.

*Note on licensing: Ultralytics YOLO is dual-licensed: AGPL-3.0 (default) or Enterprise for commercial use. Users should review https://www.ultralytics.com/license to determine which applies to their use case.

^ Return to the top