Hi.
We have around 30 models in model storage that we use for batch scoring. These are created at different times by different person and on different cluster run times.
Now we have run into problems that we can't de-serialize the models and use for inference since there are missmatched version of spark and/or sklearn.
What I've tried:
- using requirements.txt from respective model together with pip install
- Problems with this solution:
- it's not possible to change spark version on a cluster with pip install, and there are depencies on spark for desrialization of the model
- sometimes the autogenerated requirements.txt from mlflow.log_model() contains incompatible package version and pip install exits with an error code
My question are
- is some recommended way of handling (batch) scoring and keeping track of the combination of cluster runtime and requirements for each model, is there any databricks documentation i can read?
- Can I find in model registry with clusterid or runtime a model was created on?
Thanks.