Hello !
I was wondering how impactful a model's size of inference lag was in a distributed manner.
With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that minimizing inference lag is more important than minimizing size, since size will impact us once per model whereas lag will impact us once per observation.
I would also say that the impact is even greater with ensemble models where several models - with their own lag - each need to infer once per observation.
Is this assumption correct ?
Thank you !