cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Error loading h2o model in mlflow

vas610
New Contributor III

I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction

Error:

   Error
   Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set 

Source code:

# !pip install requests # !pip install tabulate # !pip install "colorama>=0.3.8" # !pip install future # !pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o # !pip install mlflow # !wget https://github.com/mlflow/mlflow-example/blob/master/wine-quality.csv

 import h2o
 import random
 import mlflow
 import mlflow.h2o
 from h2o.estimators.random_forest import H2ORandomForestEstimator
 h2o.init()
 wine = h2o.import_file(path="winequality.csv")
 r = wine['quality'].runif()
 train = wine[r  < 0.7]
 test  = wine[0.3 <= r]
 mlflow.set_tracking_uri('https://mlflow.xxxxxxx.cloud/')
 mlflow.set_experiment("H2ORandomForestEstimator")
 
 def train_random_forest(ntrees):
     with mlflow.start_run():
         rf = H2ORandomForestEstimator(ntrees=ntrees)
         train_cols = [n for n in wine.col_names if n != "quality"]
         rf.train(train_cols, "quality", training_frame=train, validation_frame=test)      
         mlflow.log_param("ntrees", ntrees)        
         mlflow.log_metric("rmse", rf.rmse())
         mlflow.log_metric("r2", rf.r2())
         mlflow.log_metric("mae", rf.mae())       
         mlflow.h2o.log_model(rf, "model")        
         h2o.save_model(rf)            
         predict = rf.predict(test)        
         print(predict.head())

 for ntrees in [10, 20, 50, 100]:
     train_random_forest(ntrees)</pre><pre>import mlflow
 logged_model = 's3://mlflow-sagemaker/1/66f7c015fe8d4fb080940f3d31003f49/artifacts/model'

 # Load model as a PyFuncModel.
 loaded_model = mlflow.pyfunc.load_model(logged_model)

 # Predict on a Pandas DataFrame.
 import pandas as pd
 loaded_model.predict(pd.DataFrame(test))</pre>

5 REPLIES 5

Dan_Z
Honored Contributor
Honored Contributor

I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That may cause the problem.

%sh
wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv

import h2o import random import mlflow import mlflow.h2o from h2o.estimators.random_forest import H2ORandomForestEstimator

h2o.init() wine = h2o.import_file(path="./wine-quality.csv") r = wine['quality'].runif() train = wine[r < 0.7] test = wine[0.3 <= r]

def train_random_forest(ntrees): with mlflow.start_run(): rf = H2ORandomForestEstimator(ntrees=ntrees) train_cols = [n for n in wine.col_names if n != "quality"] rf.train(train_cols, "quality", training_frame=train, validation_frame=test)

mlflow.log_param("ntrees", ntrees)

mlflow.log_metric("rmse", rf.rmse()) mlflow.log_metric("r2", rf.r2()) mlflow.log_metric("mae", rf.mae())

mlflow.h2o.log_model(rf, "model")

h2o.save_model(rf)

predict = rf.predict(test)

print(predict.head()) for ntrees in [10, 20, 50, 100]: train_random_forest(ntrees

vas610
New Contributor III

@Dan Zafar I mentioned the incorrect path in the original question but I did train the model with correct file.

!wget https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv

There is no issues when trying to predict using the h2o model object. But the prediction fails when using the MLFLOW's pyfunc flavour

vas610
New Contributor III

import mlflow logged_model = 's3://mlflow-s3 sagemaker/1/58e5371188ed4t649d2d75686a9f155d/artifacts/model' 
# Load model as a PyFuncModel. 
loaded_model = mlflow.pyfunc.load_model(logged_model) 
# Predict on a Pandas DataFrame. import pandas as pd 
loaded_model.predict(pd.DataFrame(test))

vas610
New Contributor III

Error

OSError: Job with key $03017f00000132d4ffffffff$_9993cede52525f90fe9729b1ddb24cf7 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568)

vas610
New Contributor III

Error

 

stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568) at hex.Model.adaptTestForTrain(Model.java:1404) at hex.Model.adaptTestForTrain(Model.java:1400) at hex.Model.score(Model.java:1697) at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:422) at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.