08-06-2021 08:36 AM
I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction
Error:
Error
Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set
Source code:
# !pip install requests # !pip install tabulate # !pip install "colorama>=0.3.8" # !pip install future # !pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o # !pip install mlflow # !wget https://github.com/mlflow/mlflow-example/blob/master/wine-quality.csv import h2o
import random
import mlflow
import mlflow.h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
h2o.init()
wine = h2o.import_file(path="winequality.csv")
r = wine['quality'].runif()
train = wine[r < 0.7]
test = wine[0.3 <= r]
mlflow.set_tracking_uri('https://mlflow.xxxxxxx.cloud/')
mlflow.set_experiment("H2ORandomForestEstimator")
def train_random_forest(ntrees):
with mlflow.start_run():
rf = H2ORandomForestEstimator(ntrees=ntrees)
train_cols = [n for n in wine.col_names if n != "quality"]
rf.train(train_cols, "quality", training_frame=train, validation_frame=test)
mlflow.log_param("ntrees", ntrees)
mlflow.log_metric("rmse", rf.rmse())
mlflow.log_metric("r2", rf.r2())
mlflow.log_metric("mae", rf.mae())
mlflow.h2o.log_model(rf, "model")
h2o.save_model(rf)
predict = rf.predict(test)
print(predict.head())
for ntrees in [10, 20, 50, 100]:
train_random_forest(ntrees)</pre><pre>import mlflow
logged_model = 's3://mlflow-sagemaker/1/66f7c015fe8d4fb080940f3d31003f49/artifacts/model'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(pd.DataFrame(test))</pre>
08-06-2021 03:56 PM
I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That may cause the problem.
%sh
wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv
import h2o import random import mlflow import mlflow.h2o from h2o.estimators.random_forest import H2ORandomForestEstimator
h2o.init() wine = h2o.import_file(path="./wine-quality.csv") r = wine['quality'].runif() train = wine[r < 0.7] test = wine[0.3 <= r]
def train_random_forest(ntrees): with mlflow.start_run(): rf = H2ORandomForestEstimator(ntrees=ntrees) train_cols = [n for n in wine.col_names if n != "quality"] rf.train(train_cols, "quality", training_frame=train, validation_frame=test)
mlflow.log_param("ntrees", ntrees) mlflow.log_metric("rmse", rf.rmse()) mlflow.log_metric("r2", rf.r2()) mlflow.log_metric("mae", rf.mae()) mlflow.h2o.log_model(rf, "model") h2o.save_model(rf) predict = rf.predict(test) print(predict.head()) for ntrees in [10, 20, 50, 100]: train_random_forest(ntrees08-09-2021 07:11 AM
@Dan Zafar I mentioned the incorrect path in the original question but I did train the model with correct file.
!wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv
There is no issues when trying to predict using the h2o model object. But the prediction fails when using the MLFLOW's pyfunc flavour
08-09-2021 07:11 AM
import mlflow logged_model = 's3://mlflow-s3 sagemaker/1/58e5371188ed4t649d2d75686a9f155d/artifacts/model'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame. import pandas as pd
loaded_model.predict(pd.DataFrame(test))
08-09-2021 07:13 AM
Error
OSError: Job with key $03017f00000132d4ffffffff$_9993cede52525f90fe9729b1ddb24cf7 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568)
08-09-2021 07:14 AM
Error
stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568) at hex.Model.adaptTestForTrain(Model.java:1404) at hex.Model.adaptTestForTrain(Model.java:1400) at hex.Model.score(Model.java:1697) at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:422) at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group