03-22-2023 05:35 AM
Dear community,
I want to basically store 2 pickle files during the training and model registry with my keras model. So that when I access the model from another workspace (using mlflow.set_registery_uri()) , these models can be accessed as well. The custom mlflow model that I am using is as follow:
class KerasModel(mlflow.pyfunc.PythonModel):
def __init__(self, model, tokenizer_path, label_encoder_path):
self.model = model
self.tokenizer_path = tokenizer_path
self.label_encoder_path = label_encoder_path
def _load_tokenizer(self):
return joblib.load(self.tokenizer_path)
def _load_label_encoder(self):
return joblib.load(self.label_encoder_path)
def predict(self, context, input_data):
y_pred = self.model.predict(input_data)
return y_pred
and here is my training script:
import joblib
import mlflow
import mlflow.keras
import mlflow.tensorflow
from keras.preprocessing.text import Tokenizer
from sklearn.preprocessing import LabelEncoder
import keras
import tensorflow
# Load and preprocess data into train/test splits
X_train, y_train = get_training_data()
########
# do data preprocessing.....
########
tokenizer_artifact_path = "/dbfs/tmp/train/tokenizer.pkl"
joblib.dump(fitted_tokenizer, tokenizer_artifact_path)
label_encoder_artifact_path = "/dbfs/tmp/train/label_encoder.pkl"
joblib.dump(fitted_label_encoder, label_encoder_artifact_path)
with mlflow.start_run() as mlflow_run:
# Fit keras model and log model
########
# build keras model.....
########
model, model_history = model.fit(X_train, y_train)
mlflow.keras.log_model(model, "model")
# log label encoder and tokenizer as artifact
mlflow.log_artifact(tokenizer_artifact_path)
mlflow.log_artifact(label_encoder_artifact_path)
# Create a PyFunc model that uses the trained Keras model and label encoder
pyfunc_model = KerasModel(model, tokenizer_artifact_path, label_encoder_artifact_path)
mlflow.pyfunc.log_model("custom_model", python_model= pyfunc_model)
# get mlflow artifact uri
artifact_uri = mlflow_run.info.artifact_uri
model_uri = artifact_uri + "/custom_model"
# Register model to MLflow Model Registry if provided
mlflow.set_registry_uri("my registery_uri")
mlflow.register_model(model_uri, name="keras_clssification")
The problem is when I want to access this registered model from another workspace, I can load the model but not the pickle files and it throw the error that
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/train/label_encoder.pkl'
I use the following code:
mlflow.set_registery_uri("my model_registery_uri")
model = mlflow.pyfunc.load_model("model_uri")
unwrapped_model = model.unwrap_python_model()
label_encoder = unwrapped_model._load_label_encoder()
tokenizer = unwrapped_model._load_tokenizer()
It works on the same work space since the path is recognizable. But on other workspace it has not access to this. My question is how to store these two pickle files with the model so that where ever the model goes, these files goes as well?
I have checked this solution here as well, unfortunately I could not understand it completely.
If you could post your answer with code I really appreciate!
With many thanks in advance!
03-23-2023 04:52 AM
@Saeid Hedayati :
To store the pickle files along with the MLflow model, you can include them as artifacts when logging the model. You can modify your training script as follows:
import joblib
import mlflow
import mlflow.keras
import mlflow.tensorflow
from keras.preprocessing.text import Tokenizer
from sklearn.preprocessing import LabelEncoder
import keras
import tensorflow
# Load and preprocess data into train/test splits
X_train, y_train = get_training_data()
########
# do data preprocessing.....
########
tokenizer_artifact_path = "/dbfs/tmp/train/tokenizer.pkl"
joblib.dump(fitted_tokenizer, tokenizer_artifact_path)
label_encoder_artifact_path = "/dbfs/tmp/train/label_encoder.pkl"
joblib.dump(fitted_label_encoder, label_encoder_artifact_path)
with mlflow.start_run() as mlflow_run:
# Fit keras model and log model
########
# build keras model.....
########
model, model_history = model.fit(X_train, y_train)
mlflow.keras.log_model(model, "model")
# log label encoder and tokenizer as artifacts
mlflow.log_artifact(tokenizer_artifact_path)
mlflow.log_artifact(label_encoder_artifact_path)
# Create a PyFunc model that uses the trained Keras model and label encoder
pyfunc_model = KerasModel(model, tokenizer_artifact_path, label_encoder_artifact_path)
# Log the PyFunc model with artifacts
mlflow.pyfunc.log_model(pyfunc_model, "custom_model", artifacts={
"tokenizer": tokenizer_artifact_path,
"label_encoder": label_encoder_artifact_path
})
# get mlflow artifact uri
artifact_uri = mlflow_run.info.artifact_uri
model_uri = artifact_uri + "/custom_model"
# Register model to MLflow Model Registry if provided
mlflow.set_registry_uri("my registery_uri")
mlflow.register_model(model_uri, name="keras_clssification")
In the above code, the artifacts (i.e., the pickle files) are logged along with the PyFunc model using the
mlflow.pyfunc.log_model() method. The artifacts are specified as a dictionary where the keys are the names of the artifacts and the values are the paths to the artifact files.
To load the model and the artifacts in another workspace, you can use the following code:
import mlflow.pyfunc
import joblib
# Load the model from the MLflow Model Registry
mlflow.set_registry_uri("my model_registry_uri")
model = mlflow.pyfunc.load_model("model_uri")
# Load the artifacts
tokenizer_path = model.metadata['signature_def']['serving_default']['inputs']['tokenizer'].string_value
label_encoder_path = model.metadata['signature_def']['serving_default']['inputs']['label_encoder'].string_value
tokenizer = joblib.load(tokenizer_path)
label_encoder = joblib.load(label_encoder_path)
# Get the PyFunc model and predict on new data
unwrapped_model = model._get_unwrapped_model()
y_pred = unwrapped_model.predict(input_data)
In the above code, we load the model and then extract the paths to the artifacts from the model metadata. We then load the artifacts using joblib.load() and use them to predict on new data.
03-25-2023 07:29 AM
Hi @Suteja Kanuri ,
Thank you for the solution. I have also noticed that instead of passing pickle files path, I can pass the fitted_tokenizer and fitted_label_encoder object directly to the KerasModel class. This solution worked for me. But yours also looks correct!
04-01-2023 09:03 PM
@Saeid Hedayati :
Yes, that's another way to pass the tokenizer and label encoder objects directly to the KerasModel class instead of passing their paths. I'm glad to hear that the solution worked for you! Let me know if you have any other questions.
04-03-2023 12:33 AM
thank you @Suteja Kanuri for your support, really appreciated!
04-03-2023 12:37 AM
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group