08-06-2021 08:36 AM
I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction
Error:
Error
Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set
Source code:
# !pip install requests # !pip install tabulate # !pip install "colorama>=0.3.8" # !pip install future # !pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o # !pip install mlflow # !wget https://github.com/mlflow/mlflow-example/blob/master/wine-quality.csv import h2o
import random
import mlflow
import mlflow.h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
h2o.init()
wine = h2o.import_file(path="winequality.csv")
r = wine['quality'].runif()
train = wine[r < 0.7]
test = wine[0.3 <= r]
mlflow.set_tracking_uri('https://mlflow.xxxxxxx.cloud/')
mlflow.set_experiment("H2ORandomForestEstimator")
def train_random_forest(ntrees):
with mlflow.start_run():
rf = H2ORandomForestEstimator(ntrees=ntrees)
train_cols = [n for n in wine.col_names if n != "quality"]
rf.train(train_cols, "quality", training_frame=train, validation_frame=test)
mlflow.log_param("ntrees", ntrees)
mlflow.log_metric("rmse", rf.rmse())
mlflow.log_metric("r2", rf.r2())
mlflow.log_metric("mae", rf.mae())
mlflow.h2o.log_model(rf, "model")
h2o.save_model(rf)
predict = rf.predict(test)
print(predict.head())
for ntrees in [10, 20, 50, 100]:
train_random_forest(ntrees)</pre><pre>import mlflow
logged_model = 's3://mlflow-sagemaker/1/66f7c015fe8d4fb080940f3d31003f49/artifacts/model'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(pd.DataFrame(test))</pre>
08-06-2021 03:56 PM
I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That may cause the problem.
%sh
wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv
import h2o import random import mlflow import mlflow.h2o from h2o.estimators.random_forest import H2ORandomForestEstimator
h2o.init() wine = h2o.import_file(path="./wine-quality.csv") r = wine['quality'].runif() train = wine[r < 0.7] test = wine[0.3 <= r]
def train_random_forest(ntrees): with mlflow.start_run(): rf = H2ORandomForestEstimator(ntrees=ntrees) train_cols = [n for n in wine.col_names if n != "quality"] rf.train(train_cols, "quality", training_frame=train, validation_frame=test)
mlflow.log_param("ntrees", ntrees) mlflow.log_metric("rmse", rf.rmse()) mlflow.log_metric("r2", rf.r2()) mlflow.log_metric("mae", rf.mae()) mlflow.h2o.log_model(rf, "model") h2o.save_model(rf) predict = rf.predict(test) print(predict.head()) for ntrees in [10, 20, 50, 100]: train_random_forest(ntrees08-09-2021 07:11 AM
@Dan Zafar I mentioned the incorrect path in the original question but I did train the model with correct file.
!wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv
There is no issues when trying to predict using the h2o model object. But the prediction fails when using the MLFLOW's pyfunc flavour
08-09-2021 07:11 AM
import mlflow logged_model = 's3://mlflow-s3 sagemaker/1/58e5371188ed4t649d2d75686a9f155d/artifacts/model'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame. import pandas as pd
loaded_model.predict(pd.DataFrame(test))
08-09-2021 07:13 AM
Error
OSError: Job with key $03017f00000132d4ffffffff$_9993cede52525f90fe9729b1ddb24cf7 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568)
08-09-2021 07:14 AM
Error
stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568) at hex.Model.adaptTestForTrain(Model.java:1404) at hex.Model.adaptTestForTrain(Model.java:1400) at hex.Model.score(Model.java:1697) at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:422) at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.