cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
mmt
Databricks Employee
Databricks Employee

Introduction

Ultralytics YOLO [1] (You Only Look Once) is one of the most widely used computer vision frameworks. It is fast, accurate, and well supported, with a range of model sizes (from nano to extra-large) so you can trade off speed and accuracy for edge or server deployment. Training and inference are straightforward with a Python API and practical documentation, and the ecosystem features readily available pretrained weights, support for standard datasets (e.g. COCO), and ongoing active model development, as exemplified by recent advancements in YOLO11

For teams adopting computer vision (CV) tasks on Databricks, Ultralytics YOLO is a practical choice for both prototyping and production pipelines. The framework supports multiple CV tasks— object detection, classification, segmentation, pose estimation, and oriented bounding boxes (OBB) —each with models in several sizes (nano to extra-large, often denoted as n, s, m, l, x). 

 

mmt_0-1774033788883.png

Figure 1: Common computer vision tasks and their associated annotation type. 

 

This post demonstrates a single-node workflow for training an object detection model on Databricks AI Runtime — scalable, serverless NVIDIA GPU compute. We use the nano YOLO model, YOLO11n, for real-time performance that outputs bounding boxes, class labels, and confidence scores. The process covers training YOLO11n on the COCO128 dataset (demo-only; refer to Data preparation for production guidance) and deploying it to Model Serving. Deployment includes a custom MLflow Pyfunc wrapper to handle base64 image input to the YOLO model and structured bounding-box output from model prediction.

 

Why AI Runtime and single-node?

Critically, running YOLO on Databricks AI Runtime lets you train and iterate without provisioning or managing clusters: you get GPU compute on demand, pay for what you use, and when you are done the compute is terminated. This makes it ideal for experimentation, proof-of-concept, and small-to-medium training jobs—and MLflow and Unity Catalog keep experiments and artifacts organized.

Single-node (one GPU instance) keeps the workflow simple and sufficient for many object-detection use cases. YOLO11n is a small model; training on datasets in the low thousands to tens of thousands of images often fits comfortably on one GPU (e.g. A10). A single node avoids distributed-training setup, multi-worker debugging, and extra cost—so you can focus on data, labels, and the MLflow-to-serving path. 

When your dataset or model grows and training time becomes a bottleneck, you can move to multi-GPU or multi-node patterns; the same registration and deployment steps in this post still apply. 

 

What you’ll need to get started

Add the notebook via Import a notebook or Clone a Git repo (Repos), then attach to Serverless GPU: 

Navigate to Compute dropdown, Connect to and Configure the Notebook AI Runtime. Within the Environment Panel found on the right hand edge of the notebook, select A10 for the Accelerator and AI v4 for Base environment. Finally, click Apply and then Confirm, as shown in the figure below. 

mmt_0-1774056975172.png

Figure 2: Connecting to AI Runtime Serverless GPU cluster and configuring the Notebook Environment:

ConnectServerless GPUEnvironmentAccelerator: A10, Environment: AI v4 Apply & Confirm

 

Given that packages and dependencies are installed in the first notebook cell, there is no need to install within the cluster environment panel.

 

Workflow overview

The notebook walks through six steps in order; each builds on the previous one.

mmt_2-1774033788884.png

Figure 3. An end-to-end workflow. 

 

  1. Setup: Environment and Unity Catalog

After attaching to Serverless GPU (see above: ConnectServerless GPUAccelerator A10, Environment AI v4 Apply and Confirm), the first steps are to install the required Python packages and configure your Unity Catalog project structure.

Install MLflow, Ultralytics, and supporting packages (e.g. nvidia-ml-py, threadpoolctl). Restart Python after the first %pip cell, then set a writable YOLO config directory to avoid permission issues.

# Package installation for AI Runtime (run once, then restart Python) 

%pip install -U mlflow>=3.0
%pip install ultralytics==8.3.204
%pip install nvidia-ml-py==13.580.82
%pip install threadpoolctl==3.1.0  
dbutils.library.restartPython()  
  
# Set writable YOLO config dir (avoids permission errors) 
  
import os, uuid 
config_dir = f'/tmp/yolo_config_{uuid.uuid4().hex[:8]}'  
os.environ['YOLO_CONFIG_DIR'] = config_dir  
os.makedirs(config_dir, exist_ok=True)

Next, create or use a catalog, schema, and Unity Catalog Volume for data, raw models, and model checkpoints from training runs. Use widgets for catalog, schema, volume, and model name so the same notebook can be reused across workspaces.

# Widgets for catalog, schema, volume, model name   
  
catalog_name = dbutils.widgets.get("catalog_name")   # e.g. "main"    
schema_name = dbutils.widgets.get("schema_name")     # e.g. "default"  
volume_name = dbutils.widgets.get("volume_name")     # e.g. "yolo_sgc"  
  
spark.sql(f"CREATE SCHEMA IF NOT EXISTS `{catalog_name}`.`{schema_name}`")  
spark.sql(f"CREATE VOLUME IF NOT EXISTS `{catalog_name}`.`{schema_name}`.`{volume_name}`")  
  
project_location = f'/Volumes/{catalog_name}/{schema_name}/{volume_name}/'  
os.makedirs(f'{project_location}runs/', exist_ok=True)  
os.makedirs(f'{project_location}data/', exist_ok=True)  
os.makedirs(f'{project_location}raw_model/', exist_ok=True)

 

  1. Data preparation  

The dataset is configured via a YAML file (path, splits, class names). We download the COCO128 config and data to the volume, then split it into train (62.5%), validation (18.75%), and test (18.75%) with a fixed seed, updating the YAML with the new paths. For custom data, you typically adjust the YAML for your paths and classes. We use the Ultralytics COCO128.YAML, downloaded to the UC Volume, but you can substitute your own config (e.g., data.yaml).

# Download COCO128 dataset configuration to UC Volume 
import yaml 

os.makedirs(f'{project_location}data/coco128', exist_ok=True)
config_url =
"https://github.com/ultralytics/ultralytics/raw/main/ultralytics/cfg/datasets/coco128.yaml"
config_path = f"{project_location}data/coco128.yaml"

download_file(config_url, config_path, "COCO128 config") 

# Then load config, set data['path'] to volume path, download/extract dataset if needed, save updated YAML   

Split the data and update the YAML with train/val/test image paths:

train_size, val_size, test_size = split_dataset( 
    source_images_dir=f"{project_location}data/coco128/images/train2017",
    source_labels_dir=f"{project_location}data/coco128/labels/train2017",
    base_images_dir=f"{project_location}data/coco128/images",
    base_labels_dir=f"{project_location}data/coco128/labels",
    train_ratio=0.625,   # 62.5%
    val_ratio=0.1875,    # 18.75%  
    random_seed=42  
)

# In the notebook: update data.yaml so 'train', 'val', 'test' point to the new split dirs (e.g. .../images/train, .../images/val, .../images/test)  

Important note: COCO128 is used here only for demonstration. With ~128 images it is too small for production and will overfit. For real use cases, use larger datasets (e.g. 100K+ images or 1K+ domain-specific images). The same workflow applies—update data paths and config as needed.

 

  1. MLflow: Custom PyFunc Wrapper and Configuration

To deploy the trained YOLO model to Model Serving, we need a single, serializable API: the endpoint will receive requests (e.g., base64-encoded images) and return structured responses (e.g., bounding boxes). YOLO’s native API expects file paths or NumPy arrays and returns a rich in-memory object, which is not what the serving layer expects.

The notebook therefore defines a MLflow Custom Pyfunc wrapper YOLOWrapper(mlflow.pyfunc.PythonModel) that accepts a DataFrame with an image_base64 column and returns a DataFrame of detections (class, confidence, bbox columns). 

The wrapper class has three methods: load_context loads the .pt artifact into self.model when the model is loaded (e.g. at serving startup); predict accepts a DataFrame with image_base64, decodes each image, runs YOLO, and returns a DataFrame via _format_predictions; _format_predictions converts YOLO’s Results (.boxes, .names) into a single DataFrame with class name, class id, confidence, and bbox columns (xyxy and xywh). We define the wrapper now so it's ready to use immediately after training completes in Step 4.

class YOLOWrapper(mlflow.pyfunc.PythonModel):
    """Custom MLflow wrapper for YOLO models using base64-encoded images."""  

    def load_context(self, context):
        """Load YOLO model from artifacts (called once when model is loaded)."""  
        from ultralytics import YOLO  
        model_path = context.artifacts["yolo_model"]  
        self.model = YOLO(model_path, task='detect')  

    def _format_predictions(self, predictions):
        """Convert YOLO Results to a single DataFrame with class, confidence, bbox columns."""  
        import pandas as pd  
        all_results = []  
        for prediction in predictions:  
            if prediction.boxes is not None:  
                boxes = prediction.boxes  
                for i in range(len(boxes)):  
                    box_xyxy = boxes.xyxy[i].cpu().numpy()  
                    box_xywh = boxes.xywh[i].cpu().numpy()  
                    all_results.append({   
                        "class_name": prediction.names[int(boxes.cls[i])],  
                        "class_num": int(boxes.cls[i]),  
                        "confidence": float(boxes.conf[i]),  
                        "bbox_x1": float(box_xyxy[0]), "bbox_y1": float(box_xyxy[1]),  
                        "bbox_x2": float(box_xyxy[2]), "bbox_y2": float(box_xyxy[3]),  
                        "bbox_center_x": float(box_xywh[0]), "bbox_center_y": float(box_xywh[1]),  
                        "bbox_width": float(box_xywh[2]), "bbox_height": float(box_xywh[3]),  
                    })  
        return pd.DataFrame(all_results)  

    def predict(self, context, model_input):
        """Accept DataFrame with image_base64; decode, run YOLO, return DataFrame of detections."""
        import pandas as pd  
        import base64  
        from PIL import Image  
        import io  
        import numpy as np

        if 'image_base64' not in model_input.columns:  
            raise ValueError("DataFrame must contain 'image_base64' column")  
        all_predictions = []  

        for image_base64 in model_input['image_base64'].tolist():  
            image_bytes = base64.b64decode(image_base64)  
            image_array = np.array(Image.open(io.BytesIO(image_bytes)))  
            predictions = self.model.predict(image_array, verbose=False)  
            all_predictions.extend(predictions)

        return self._format_predictions(all_predictions)

3.1 MLflow Configuration

We infer the model signature from a sample prediction using this custom wrapper (DataFrame with image_base64 input and detection columns output). We also set the MLflow experiment (e.g., under /Workspace/Shared/) and enable system metrics logging, along with YOLO’s MLflow integration and MLflow autologging.

# Infer signature from a sample image (input: base64, output: bbox columns)   
signature, input_example = infer_model_signature(model_path, sample_images[0])  
  
# Enable system metrics and set experiment   
experiment_name, experiment_id = setup_mlflow_experiment(  
    use_workspaceUsers_path=False,  
    expt_name_suffix="Experiments_YOLO_CoCo"  
)

The model is then registered to Unity Catalog (mlflow.pyfunc.log_model) using this wrapper and the best checkpoint.pt artifact (called after training in Step 4):

mlflow.pyfunc.log_model(  
    name="model",  
    python_model=YOLOWrapper(),  
    artifacts={"yolo_model": model_path},  
    signature=signature,  
    input_example=input_example,  
    registered_model_name=registered_model_name,  
    pip_requirements=["ultralytics==...", "cloudpickle==...", "torch", "torchvision", "pillow", "numpy"],  
)

 

  1. Model training  

Train YOLO11n with your chosen hyperparameters (epochs, batch size, learning rate, patience, dropout, weight decay); these are specified in the config variables within model.train() as shown in the code snippet. Training runs in a unique temp directory, and the results and validation metrics are copied into the volume under a named run folder ({task}_{model}_{dataset}_{timestamp}_run_{run_id}). The best checkpoint is saved and is then registered to Unity Catalog with the custom PyFunc wrapper (base64 in, structured detections out) defined in the previous step. 

model = YOLO(model_path)
results = model.train(
    task="detect",
    batch=4,
    device=0,                    # Single GPU for Serverless AI Runtime  
    data=data_yaml_path,
    epochs=100,
    lr0=0.001,
    project=project_location,
    name=f"run_{timestamp}",
    patience=5,                  # Update where appropriate     
    dropout=0.2,
    weight_decay=0.0005,
    save=True,
)
run_id = mlflow.last_active_run().info.run_id

# Register to Unity Catalog with custom PyFunc wrapper (base64 in, bbox out) 
registered_model_name = register_yolo_model(
    run_id=run_id,
    model_path=best_model_path,
    catalog_name=catalog_name,
    schema_name=schema_name,
    model_name=model_name,
    signature=signature,
    input_example=input_example,
    data_yaml_path=data_yaml_path,
)

 

  1. Model evaluation  

Evaluate the registered model on validation and test sets (sample predictions and metrics), then run a local serving test by loading the model via mlflow.pyfunc.load_model() and calling it with base64-encoded images to confirm the same interface the endpoint will use.

# Local serving test: same I/O as the deployed endpoint 
model_uri = f"models:/{registered_model_name}/{latest_version}" 
serving_model = mlflow.pyfunc.load_model(model_uri)

with open(test_image_path, 'rb') as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')
input_df = pd.DataFrame({"image_base64": [image_base64]})
predictions = serving_model.predict(input_df)  # DataFrame with class_name, confidence, bbox_*

 

  1. Model deployment  

After a manual checkpoint (e.g. a “Proceed with Deployment” widget), you can create or update a Custom Model Serving endpoint. The deployment configuration includes: 

  • WorkspaceClient SDK – enables programmatic endpoint management, ensuring deployments are repeatable, version-controlled, and integrated into automated workflows; 
  • Small endpoint workload compute size and enabling scale-to-zero — minimizes compute costs during development and evaluation by provisioning resources on demand and releasing them when the endpoint is idle; 
  • AI Gateway inference tables — automatically logs all request and response payloads to a Unity Catalog delta table, providing a built-in audit trail for monitoring, debugging, and downstream evaluation without additional instrumentation. 

6.1 Create the endpoint with AI Gateway enabled:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import (
    ServedEntityInput, EndpointCoreConfigInput,
    AiGatewayConfig, AiGatewayInferenceTableConfig,
)

w = WorkspaceClient()
w.serving_endpoints.create( 
    name=endpoint_name,
    config=EndpointCoreConfigInput(
        served_entities=[
            ServedEntityInput(
                entity_name=registered_model_name,
                entity_version=str(model_version),
                workload_size="Small",
                scale_to_zero_enabled=True,
            )
        ]
    ),
    ai_gateway=AiGatewayConfig(
        inference_table_config=AiGatewayInferenceTableConfig(
            catalog_name=catalog_name,
            schema_name=schema_name,
            table_name_prefix=endpoint_name,
            enabled=True,
        )
    ),
)

Test the deployed endpoint by calling it with a base64-encoded image and verify the structured bounding-box response.

6.2 Call the endpoint with base64 input (same as local PyFunc test):

import base64

with open(test_image_path, 'rb') as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')


response = w.serving_endpoints.query(
    name=endpoint_name,
    dataframe_records=[{"image_base64": image_base64}],
)

# Response contains DataFrame with class_name, confidence, bbox_x1, bbox_y1, ... 

Successful testing confirms the custom PyFunc wrapper's ability to handle base64-encoded image input and return a structured bounding-box output from the Model Serving endpoint, which are critical technical considerations detailed next.

 

Key technical details

A few details are worth calling out for implementation and operations:

  • Base64 input: The custom MLflow PyFunc wrapper accepts a DataFrame with an image_base64 column, where images are encoded as base64 strings.  In our example, JPEG images are used; other formats (e.g. PNG, BMP) may work but have not been validated here. Base64 encoding keeps the API simple and works across network boundaries for Model Serving.
  • Bounding box output: The wrapper returns a DataFrame with columns such as class_name, class_num, confidence, bbox_x1, bbox_y1, bbox_x2, bbox_y2, and center/width/height. This structure is inferred once and used for the registered model’s signature.
  • Why we use a custom wrapper: YOLO’s native API expects image paths or NumPy arrays and returns a rich in-memory object; Model Serving expects a single, serializable contract (DataFrame input and output) for HTTP requests and responses. The wrapper (1) accepts base64-encoded images in a DataFrame column (JSON-friendly), (2) loads the .pt artifact and runs YOLO object detection task inside predict, and (3) returns a structured DataFrame of detections that the endpoint can serialize. Without it, the raw YOLO model could not be deployed as a standard PyFunc. The full implementation is in the Custom PyFunc wrapper (for Model Serving) section above.
  • Unity Catalog Volume structured layout: Data is stored in /Volumes/{catalog}/{schema}/{volume}/data/. Pretrained weights are located in raw_model/. Each training run has its own dedicated folder under runs/{task}_{model}_{dataset}_{timestamp}_run_{run_id}/, which includes subfolders for train/validation_metrics/, validation_samples/, and test_samples/.
  • Deployment safety: A parameter widget (e.g. “Proceed with Deployment”) gates the deployment cells so “Run All” doesn’t deploy by accident. Endpoint creation/update can take on the order of 10–20 minutes; the notebook can exit early and direct you to re-run or check the UI.
  • AI Gateway: New endpoints are created with AI Gateway inference table config (catalog, schema, table name prefix). The payload table is created and populated after the first requests; there can be a short delay before rows appear. You can query the table (e.g. SELECT * FROM catalog.schema.endpoint_payload ORDER BY timestamp_ms DESC) to inspect logged requests.

 

Conclusion

The ability to train or fine-tune and deploy YOLO (You Only Look Once) models on Databricks Data Intelligence Platform provides enterprises with a high-performance, cost-optimized, and easily adoptable Computer Vision (CV) solution. Our walkthrough shows a complete path from raw images to a live YOLO endpoint on Databricks — no cluster provisioning, full MLflow tracking, Unity Catalog governance, and production-ready serving with built-in request logging. Swap COCO128 for your own dataset and the same workflow applies. As your data or model complexity grows, the same registration and deployment steps extend to multi-GPU and multi-node training patterns.

 

mmt_3-1774033788904.png

Figure 4: Example validation of YOLO object detection inference on a sample of COCO128 images

 

Next Steps: 

Here's how you can try this out:

  • Single-Node YOLO Example: Clone the cv-playground repository and run the notebook in your Databricks workspace. Attach it to an AI Runtime (A10) with the AI v4 environment. After running the cells, swap the COCO128 dataset for your own data and paths.
  • Larger-Scale Instance Segmentation: For a bigger example using YOLO, check out the NuInsSeg project within the same repository.
  • Monitor and Improve: Use the AI Gateway inference table to monitor model traffic, debug inputs and outputs, and feed insights back into your analytics or model improvement pipeline.

 

 Stay tuned for the follow-up post on multi-GPU and multi-node YOLO model training on AI Runtime!

 

[1] Ultralytics YOLO is dual-licensed: AGPL-3.0 (default) or Enterprise for commercial use. Users should review https://www.ultralytics.com/license to determine which applies to their use case.