cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLFlow model loading taking long time and "model serving" failing during init

145093
New Contributor II

I am trying to load a simple Minmaxscaler model that was logged as a run through spark's ML Pipeline api for reuse. On average it takes 40+ seconds just to load the model with the following example:

simple model load 

This is fine and the model transforms my data correctly, but I have a job schedule that has to run for a real-time application and randomly the simple model loading will take almost 3 minutes to load on some runs as the output below shows:

sometimes the model takes almost 3 min just to load 

I also tried loading with pyfunc instead of spark, but it didn't help. I am running the job schedule on an all-purpose compute AWS i3 driver cluster with 4 i3 workers on 24/7, and the 3 min for loading a model will not suffice for my real time needs. Since model loading is slow, I decided to try "model serving".

Next, I clicked "register model" , then tried the model serving solution for real time needs and ran into a separate issue where the conda environment creation fails during the init because of failing to build spark. I verified that the model worked when loaded straight from the run, but the model serving fails despite following the guide and simply pressing the "enable serving" from the model registry UI. The full logs from the model serving UI are attached below, but the error is this:

"

Failed to build pyspark

...

Conda environment creation failed during pip installation! See error above.

"

I need one of these two issues to be addressed in order to meet real time needs of my application: either faster and more consistent model loading or model serving that actually works and doesn't fail to build.

2 REPLIES 2

RafaelC
New Contributor II

Hi,

By any chance have you found a way to tackle this issue ?

I'm having the same one. I do not need real time but having a 3 minutes loading time when performing inference is a bit too much ...

I'm not really sure what's going on in the back ground either

Thanks!

DanSimpson
New Contributor II

Hello,

Any solutions found for this issue?

I'm serving up a large number of models at a time, but since we converted to PySpark (due to our data demands), the mlflow.spark.load_model() is taking hours.

Part of the reason to switch to spark was to help with speed as we scale, but this feels like a huge trade-off.

Any suggestions would be helpful! Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group