Overview

chandhana_p · ‎02-02-2026

Overview

Manufacturers price millions of parts across product families, geographies, sales units, and channels. These differences often require dozens of specialized ML models—each optimized for a particular part segment, sales region, or business unit. While this blog focuses on manufacturing as a case study, the dynamic model routing pattern applies broadly to any application that needs a single endpoint to select among multiple ML models based on request attributes. With a nod to Tim Peters’ Zen of Python—“explicit is better than implicit”— we’re going to take a moment to define the Dynamic Model Routing pattern, as there can be various viewpoints. Dynamic Model Routing Pattern is when a system explicitly chooses the best-suited model for each request at runtime based on context, cost, performance, or other domain-specific criteria. We show both the logical architecture and the implementation strategy for this design pattern on databricks.

In pricing for manufacturing, this matters because pricing models rarely evolve at the same pace: some part families demand frequent retraining due to volatile demand or competitive pressure, while others change slowly and stay stable for months. A modular architecture lets each model group evolve independently—retraining when needed, deploying new versions, and leaving the rest untouched—while applications enjoy a simple contract: one API where they submit part information and receive forecasted prices.

In other words, your architecture can be as complex as it needs to be behind the scenes, but your users still get to say: “Give me one API to submit part information and get back forecasted prices”—and never worry about which model did the work.

A Dynamic Modular architecture supports::

One shared endpoint for all applications.
Automatic model selection via routing logic.
Real-time & batch inference with the same code.
Autoscaling, concurrency, and lookup performance handled entirely by Databricks.

This blog walks through how to build a Dynamic Model Router on Databricks, powered by:

Unity Catalog for governance of models, features, and routing configuration (to ensure isolation, lineage, auditability, and controlled promotions/rollbacks).
Model Serving for scalable, low-latency inference.
Feature Store + Online Feature Store for feature consistency and enrichment (with millisecond lookups needed in real-time parts-pricing scenarios).
MLflow for versioning, lifecycle management, and version isolation for each model group.

1. Many Applications, Many Requests, Many Models — One Unified Endpoint

A global pricing ecosystem typically has:

Multiple product families, each with different pricing dynamics.
Different unit categories (retail, fleet, bulk, region-specific).
Regional and channel-specific pricing rules.
Frequent retraining, requiring version isolation for each model group.

Even with 20+ pricing models behind the scenes, the consuming applications expect:

One request → One API.
Flexible input payloads.
Real-time responses.
No knowledge of which underlying model generated the prediction.

Note on terminology: “pricing models” refers to ML models specialized for specific manufacturing slices (for example, segment × region × channel), each mapped 1:1 to a deployable model that can be versioned and promoted independently.

Example: Multi-Input Requests from Multiple Applications

Application 1

{
  "part": "P-1001",
  "product_segment": "SEG12",
  "unit_category": "UNIT_A1",
  "order": "O-1001",
  "region": "US"
}

Application 2

{
  "part": "P-1002",
  "product_segment": "SEG19",
  "unit_category": "UNIT_B2",
  "order": "O-1005",
  "region": "US"
}

Application 3

{
  "part": "P-1006",
  "product_segment": "SEG19",
  "unit_category": "UNIT_E5",
  "order": "O-1005",
  "region": "EU"
}

2. End-to-End Architecture

Logical Architecture: Dynamic Model Router Responsibilities

Enriches inputs with Feature Store lookup.
Applies routing rules.
Groups requests by model key.
Calls the correct pricing model endpoint(s) in parallel; reassembles results in the original request order. This router endpoint provides the orchestration layer (enrich → route → batch → fan-out → reassemble) so applications always call one endpoint.

Core Databricks Components : Feature Store & Online Feature Store

Store and serve features such as product_segment and unit_category; provide low-latency lookups for real-time serving; guarantee consistency between training and inference.

Pricing Models: Distinct logical components in the Dynamic Model Routing

Each pricing model is a first-class component with a clear responsibility and lifecycle: it can be retrained, versioned, and promoted independently, and is deployed as its own autoscaled, Unity-Catalog–governed Model Serving endpoint. The serving layer then exposes all of these endpoints through a single pricing interface, so consuming applications have one unified API while many specialized models work behind the scenes.

Example: Response

[
  {"part": "P-1001", "forecast_price": 17.42, "source_model": "R1_SEG12"},
  {"part": "P-1006", "forecast_price": 23.10, "source_model": "R2_SEG12"}
]

3. Generic Routing Logic for a Pricing Use Case

To determine which pricing model should process a given request, the router relies on below features defined by business stakeholders:

part — part number.
product_segment — grouping related parts into families.
unit_category — describes how parts are sold (retail, fleet, channel-specific, etc.).
order — order number.
region — US, CAN, EU, etc.

These are stored in the Feature Store and are accessible real-time via the Online Feature Store.

Example: Routing Table

Model Key	product_segment	unit_category	part	order	region
R1_SEG12	"SEG12, SEG19"	"UNIT_A1, UNIT_B2, UNIT_C3"	P-1001	O-1001	US
R2_SEG12	"SEG12, SEG19"	"UNIT_D4, UNIT_E5, UNIT_F6"	P-1005	O-1007	EU

Example: Routing Logic

IF product_segment IN ('SEG12', 'SEG19')

AND unit_category IN ('UNIT_A1', 'UNIT_B2', 'UNIT_C3')

AND part = 'P-1001'

AND order = 'O-1001'

AND region = 'US'

THEN use model R1_SEG12

Each routing key (e.g., R1_SEG12) maps to a model in Unity Catalog, such as:

Model_uri -> models:/pricing_model_R1_SEG12/Production. In Databricks, with Unity Catalog, governance spans models, features, and routing configuration, providing lineage and permissions to enable safe, controlled changes.

4. How the Router Decides Which Model to Call

Example requests:

[
  {"part": "P-1001", "product_segment": "SEG12", "unit_category": "UNIT_A1"},
  {"part": "P-2002", "product_segment": "SEG12", "unit_category": "UNIT_D4"},
  {"part": "P-3003", "product_segment": "SEG12", "unit_category": "UNIT_A1"}
]

Routing Result

part	segment	category	model
P-1001	SEG12	UNIT_A1	R1
P-2002	SEG12	UNIT_D4	R2
P-3003	SEG12	UNIT_A1	R1

Router Batches

Batch R1 → P-1001, P-3003.
Batch R2 → P-2002. Each batch is sent to the correct Model Serving endpoint.

5. Router Implementation Using MLflow & Model Serving

Here is the simplified router class:

class PricingRouter(mlflow.pyfunc.PythonModel):
    def __init__(self, routing_config, feature_table):
        self.routing_config = routing_config
        self.feature_table = feature_table

    def load_context(self, context):
        self.fe = FeatureEngineeringClient()
        self.ws = WorkspaceClient()

    def _enrich_features(self, df):
        lookup = self.fe.score_batch(
            table_name=self.feature_table,
            lookup_key="part_id",
            df=df[["part_id"]]
        )
        return df.merge(lookup, on="part_id", how="left")

    def _select_model_key(self, row):
        for rule in self.routing_config:
            if (row["product_segment"] in rule["product_segments"]
                and row["unit_category"] in rule["unit_categories"]):
                return rule["model_key"]
        return "default"

    def _call_endpoint(self, endpoint_name, payload):
        resp = self.ws.serving_endpoints.query(
            name=endpoint_name,
            dataframe_records=payload
        )
        return pd.DataFrame(resp.predictions)

    def predict(self, context, model_input):
        df = self._enrich_features(model_input.copy())
        df["__row_id__"] = range(len(df))
        df["routing_key"] = df.apply(self._select_model_key, axis=1)

        outputs = []
        for key, group in df.groupby("routing_key"):
            endpoint = next(rule["endpoint_name"]
                            for rule in self.routing_config
                            if rule["model_key"] == key)
            payload = group.to_dict("records")
            preds = self._call_endpoint(endpoint, payload)
            preds["__row_id__"] = group["__row_id__"].values
            preds["source_model"] = key
            outputs.append(preds)

        return (pd.concat(outputs)
                .sort_values("__row_id__")
                .drop(columns=["__row_id__"]))

Deployment with MLflow

router = PricingRouter(
    routing_config=my_routing_rules,
    feature_table="main.supply_chain.part_features"
)

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        "pricing_router",
        python_model=router,
        input_example=pd.DataFrame({"part_id": ["P-1001"]})
    )
# Deploy to Databricks Model Serving.

Databricks automatically handles autoscaling, concurrency, endpoint isolation, failover, versioning, and security.

6. Batch Forecasting Using Spark

Use the same router logic for batch scoring:

router_udf = mlflow.pyfunc.spark_udf(
    spark,
    model_uri="models:/pricing_router/Production",
    result_type="struct<forecast_price:double, source_model:string>"
)

parts_df = spark.table("main.supply_chain.open_quotes")
results = parts_df.withColumn(
    "prediction",
    router_udf(F.struct("part_id"))
)

Batch and real-time remain aligned; one artifact, two modes.

7. Why Databricks is a Strong Fit

Fully Managed Model Serving - No Kubernetes configuration; no API gateways; built-in autoscaling; high concurrency + high throughput.
Online Feature Store Integration - Millisecond-level feature lookup; same features in training + inference to reduce drift.
Unity Catalog Governance - One security model; lineage across models, features, and tables; auditing and access control; governed routing config for safe changes.
MLflow for Model Lifecycle - Versioning; reproducible deployments; multi-model management at scale.
Model Isolation + Modular Retraining - Each model retrained & deployed without touching other model groups; refresh low-quality or drifting models independently; router automatically routes to latest promoted versions.
Multi-Model Architecture, One Endpoint - Router orchestrates all routing; new model = just a config update; no app changes.
Real-Time + Batch in Harmony - Same logic everywhere; zero code duplication.

8. Closing Thoughts

Pricing across a global supply chain requires multiple ML models—but applications shouldn't inherit that complexity. A Dynamic Model Router on Databricks routes many models behind a single endpoint, enabling a clean API for all apps, governed model management, consistent features, low-latency real-time pricing, and scalable batch processing. Databricks handles the infrastructure, letting teams focus on pricing intelligence—not on operating the system.

Dynamic Model Routing at Scale with Databricks Model Serving and Feature Store

Overview

1. Many Applications, Many Requests, Many Models — One Unified Endpoint

Example: Multi-Input Requests from Multiple Applications

2. End-to-End Architecture

Example: Response

3. Generic Routing Logic for a Pricing Use Case

Example: Routing Table

Example: Routing Logic

4. How the Router Decides Which Model to Call

5. Router Implementation Using MLflow & Model Serving

6. Batch Forecasting Using Spark

7. Why Databricks is a Strong Fit

8. Closing Thoughts