How to Optimize Batch Inference for Per-Item ML Models in Databricks

jeremy98
Honored Contributor

Hi everyone, I’m relatively new to Databricks. I worked with it a few months ago, and today I encountered an issue in our system. Basically, we have multiple ML models — one for each item — and we want to run inference in a more efficient way, ideally in batch mode, instead of looping through each model sequentially. We have n items with n corresponding ML models. What would be a smart and efficient way to perform inference for all items? Is it recommended to use Model Serving and create an endpoint with Mosaic AI, or would that be unnecessarily expensive or overkill for our use case? Currently, our pipelines call the relevant ML model and run inference on a single sample record. How can we speed this up? Any advice or best practices would be greatly appreciated! Thank you!