<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error loading h2o model in mlflow in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16851#M10953</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Error&lt;/B&gt;&lt;/P&gt;
&lt;P&gt; OSError: Job with key $03017f00000132d4ffffffff$_9993cede52525f90fe9729b1ddb24cf7 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 09 Aug 2021 14:13:45 GMT</pubDate>
    <dc:creator>vas610</dc:creator>
    <dc:date>2021-08-09T14:13:45Z</dc:date>
    <item>
      <title>Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16847#M10949</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Error:&lt;/B&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;   Error
   Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;B&gt;Source code:&lt;/B&gt;&lt;/P&gt; # !pip install requests # !pip install tabulate # !pip install "colorama&amp;gt;=0.3.8" # !pip install future # !pip install -f &lt;A href="http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html" target="test_blank"&gt;http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html&lt;/A&gt; h2o # !pip install mlflow # !wget &lt;A href="https://github.com/mlflow/mlflow-example/blob/master/wine-quality.csv" target="test_blank"&gt;https://github.com/mlflow/mlflow-example/blob/master/wine-quality.csv&lt;/A&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;PRE&gt;&lt;CODE&gt; import h2o
 import random
 import mlflow
 import mlflow.h2o
 from h2o.estimators.random_forest import H2ORandomForestEstimator
 h2o.init()
 wine = h2o.import_file(path="winequality.csv")
 r = wine['quality'].runif()
 train = wine[r  &amp;amp;lt; 0.7]
 test  = wine[0.3 &amp;amp;lt;= r]
 mlflow.set_tracking_uri('https://mlflow.xxxxxxx.cloud/')
 mlflow.set_experiment("H2ORandomForestEstimator")
 
 def train_random_forest(ntrees):
     with mlflow.start_run():
         rf = H2ORandomForestEstimator(ntrees=ntrees)
         train_cols = [n for n in wine.col_names if n != "quality"]
         rf.train(train_cols, "quality", training_frame=train, validation_frame=test)      
         mlflow.log_param("ntrees", ntrees)        
         mlflow.log_metric("rmse", rf.rmse())
         mlflow.log_metric("r2", rf.r2())
         mlflow.log_metric("mae", rf.mae())       
         mlflow.h2o.log_model(rf, "model")        
         h2o.save_model(rf)            
         predict = rf.predict(test)        
         print(predict.head())

 for ntrees in [10, 20, 50, 100]:
     train_random_forest(ntrees)&amp;lt;/pre&amp;gt;&amp;lt;pre&amp;gt;import mlflow
 logged_model = 's3://mlflow-sagemaker/1/66f7c015fe8d4fb080940f3d31003f49/artifacts/model'

 # Load model as a PyFuncModel.
 loaded_model = mlflow.pyfunc.load_model(logged_model)

 # Predict on a Pandas DataFrame.
 import pandas as pd
 loaded_model.predict(pd.DataFrame(test))&amp;lt;/pre&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Aug 2021 15:36:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16847#M10949</guid>
      <dc:creator>vas610</dc:creator>
      <dc:date>2021-08-06T15:36:18Z</dc:date>
    </item>
    <item>
      <title>Re: Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16848#M10950</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That &lt;I&gt;may&lt;/I&gt; cause the problem.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;%sh
wget &lt;A href="https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv" target="test_blank"&gt;https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt;import h2o import random import mlflow import mlflow.h2o from h2o.estimators.random_forest import H2ORandomForestEstimator
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;h2o.init() wine = h2o.import_file(path="./wine-quality.csv") r = wine['quality'].runif() train = wine[r &amp;lt; 0.7] test = wine[0.3 &amp;lt;= r]&lt;/P&gt; 
&lt;P&gt;def train_random_forest(ntrees): with mlflow.start_run(): rf = H2ORandomForestEstimator(ntrees=ntrees) train_cols = [n for n in wine.col_names if n != "quality"] rf.train(train_cols, "quality", training_frame=train, validation_frame=test) &lt;/P&gt;&lt;P&gt;&lt;/P&gt; mlflow.log_param("ntrees", ntrees) &lt;P&gt;&lt;/P&gt; mlflow.log_metric("rmse", rf.rmse()) mlflow.log_metric("r2", rf.r2()) mlflow.log_metric("mae", rf.mae()) &lt;P&gt;&lt;/P&gt; mlflow.h2o.log_model(rf, "model") &lt;P&gt;&lt;/P&gt; h2o.save_model(rf) &lt;P&gt;&lt;/P&gt; predict = rf.predict(test) &lt;P&gt;&lt;/P&gt; print(predict.head()) for ntrees in [10, 20, 50, 100]: train_random_forest(ntrees&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Aug 2021 22:56:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16848#M10950</guid>
      <dc:creator>Dan_Z</dc:creator>
      <dc:date>2021-08-06T22:56:45Z</dc:date>
    </item>
    <item>
      <title>Re: Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16849#M10951</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@Dan Zafar I mentioned the incorrect path in the original question but I did train the model with correct file.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;!wget &lt;A href="https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv" target="test_blank"&gt;https://raw.githubusercontent.com/mlflow/mlflow-example/master/wine-quality.csv&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;There is no issues when trying to predict using the h2o model object. But the prediction fails when using the MLFLOW's pyfunc flavour&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Aug 2021 14:11:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16849#M10951</guid>
      <dc:creator>vas610</dc:creator>
      <dc:date>2021-08-09T14:11:37Z</dc:date>
    </item>
    <item>
      <title>Re: Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16850#M10952</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import mlflow logged_model = 's3://mlflow-s3 sagemaker/1/58e5371188ed4t649d2d75686a9f155d/artifacts/model' 
# Load model as a PyFuncModel. 
loaded_model = mlflow.pyfunc.load_model(logged_model) 
# Predict on a Pandas DataFrame. import pandas as pd 
loaded_model.predict(pd.DataFrame(test))&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Aug 2021 14:11:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16850#M10952</guid>
      <dc:creator>vas610</dc:creator>
      <dc:date>2021-08-09T14:11:59Z</dc:date>
    </item>
    <item>
      <title>Re: Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16851#M10953</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Error&lt;/B&gt;&lt;/P&gt;
&lt;P&gt; OSError: Job with key $03017f00000132d4ffffffff$_9993cede52525f90fe9729b1ddb24cf7 failed with an exception: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Aug 2021 14:13:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16851#M10953</guid>
      <dc:creator>vas610</dc:creator>
      <dc:date>2021-08-09T14:13:45Z</dc:date>
    </item>
    <item>
      <title>Re: Error loading h2o model in mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16852#M10954</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Error&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&amp;nbsp;&lt;/B&gt;&lt;/P&gt;&lt;B&gt;&lt;/B&gt;
&lt;PRE&gt;&lt;CODE&gt;stacktrace: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set at hex.Model.adaptTestForTrain(Model.java:1568) at hex.Model.adaptTestForTrain(Model.java:1404) at hex.Model.adaptTestForTrain(Model.java:1400) at hex.Model.score(Model.java:1697) at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:422) at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Aug 2021 14:14:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-loading-h2o-model-in-mlflow/m-p/16852#M10954</guid>
      <dc:creator>vas610</dc:creator>
      <dc:date>2021-08-09T14:14:24Z</dc:date>
    </item>
  </channel>
</rss>

