how to speed up inference?

jeremy98 — Thu, 23 Oct 2025 21:09:19 GMT

Hi guys,

I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the inference for each model, and each one requires a specific inference process.
How can I speed this up using some kind of batch inference step?

Re: how to speed up inference?

mark_ott — Fri, 24 Oct 2025 14:05:42 GMT

In Databricks, the most efficient way to handle multiple machine learning models for inference — especially when each model has its own inference logic — is to use batch inference with Spark DataFrames and Pandas UDFs. Instead of looping over your models sequentially in Python, you can parallelize inference across your data and model configurations using Spark’s distributed capabilities.

Batch Inference with Spark DataFrames

Databricks recommends structuring your data in a Spark DataFrame, where each row represents an item for prediction and may include metadata indicating which model to use. The workflow typically includes:

Loading your data into a Spark DataFrame (from Unity Catalog, Delta tables, or external sources).
Loading your models from the MLflow Model Registry.
Creating spark UDFs for inference using mlflow.pyfunc.spark_udf().
Applying the UDFs to your DataFrame to generate predictions in bulk.

Example:

python

import mlflow
from pyspark.sql import functions as F

predict_udf = mlflow.pyfunc.spark_udf(spark, model_uri="models:/my_model/Production")
df = df.withColumn("prediction", predict_udf(*df.columns))
df.write.mode("overwrite").saveAsTable("predictions_output")

This approach allows Spark to distribute inference tasks across multiple executors, avoiding Python’s sequential bottlenecks.

Parallel Multi-Model Inference

When dealing with multiple models (e.g., per client or product), Databricks supports parallel batch inference using the groupBy.applyInPandas() method combined with Pandas UDFs. Each Spark worker can handle inference for a different model, allowing you to:

Load each model once per worker process.
Process subsets of data in parallel.

Example pattern:

python

def run_inference(pdf):
    model_path = pdf['model_path'].iloc[0]
    model = mlflow.pyfunc.load_model(model_path)
    pdf['prediction'] = model.predict(pdf['features'])
    return pdf

result_df = df.groupBy("model_id").applyInPandas(run_inference, schema=df.schema)

This design reduces redundant model loading and uses Spark’s distributed compute layer efficiently.

Mosaic AI & AI Functions

If your inference needs involve standard ML or LLM models, you can simplify further with AI Functions or Mosaic AI batch inference. These let you run model inference directly via SQL using functions like ai_query() without manual looping or building pipelines.

Example SQL:

sql

SELECT input_text, ai_query('my_registered_model', input_text) AS prediction
FROM my_input_table

Summary of Best Practices

Technique	Use Case	Advantage
Spark UDFs (`mlflow.pyfunc.spark_udf`)	Standard ML models	Simplifies batch scoring
Pandas UDFs with `groupBy.applyInPandas`	Many models (per group)	Parallel per-model inference
Mosaic AI / AI Functions	LLMs or unified inference	Simplified SQL-based scaling
Delta Live Tables (DLT)	Scheduled, repeatable jobs	Automates production batch runs

In short, replace your Python loops with Spark-level Pandas UDFs or Databricks batch inference functions. This takes advantage of cluster parallelism and avoids sequential execution, allowing all your model inferences to run efficiently in parallel across nodes.

Re: how to speed up inference?

NandiniN — Fri, 24 Oct 2025 14:08:55 GMT

Hi @jeremy98

I have not tried this - but could using Python's multiprocessing library to assign the inference for different models to different CPU cores be something you would want to give an attempt?

Also here's a useful blog - https://docs.databricks.com/en/machine-learning/model-serving/serve-multiple-models-to-serving-endpoint.html and https://www.databricks.com/blog/2022/07/20/parallel-ml-how-compass-built-a-framework-for-training-many-machine-learning-models-on-databricks.html

Thanks!

topic Re: how to speed up inference? in Machine Learning

how to speed up inference?

Re: how to speed up inference?

Batch Inference with Spark DataFrames

Parallel Multi-Model Inference

Mosaic AI & AI Functions

Summary of Best Practices

Re: how to speed up inference?