<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15875#M10148</link>
    <description>&lt;P&gt;I think I find sort of a workaround, but I think this issue needs to be addressed anyways.&lt;/P&gt;&lt;P&gt;What I did is not the best way.&lt;/P&gt;&lt;P&gt;I used a python package called &lt;A href="https://scikeras.readthedocs.io/en/latest/" alt="https://scikeras.readthedocs.io/en/latest/" target="_blank"&gt;scikeras&lt;/A&gt; that does this wrapping and then could log the model &lt;/P&gt;&lt;P&gt;The code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import scikeras 
import tensorflow as tf 
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Input, Dense, Dropout, LSTM, Flatten, Activation 
&amp;nbsp;
from scikeras.wrappers import KerasClassifier 
  
&amp;nbsp;
class ModelWrapper(mlflow.pyfunc.PythonModel): 
    def __init__(self, model): 
        self.model = model 
&amp;nbsp;
    def predict(self, context, model_input): 
        return self.model.predict(model_input) 
&amp;nbsp;
conda_env =  _mlflow_conda_env( 
      additional_conda_deps=None, 
      additional_pip_deps=[ 
        "cloudpickle=={}".format(cloudpickle.__version__),  
        "scikit-learn=={}".format(sklearn.__version__), 
        "numpy=={}".format(np.__version__), 
        "tensorflow=={}".format(tf.__version__), 
        "scikeras=={}".format(scikeras.__version__), 
      ], 
      additional_conda_channels=None, 
  ) 
&amp;nbsp;
param = { 
   "dense_l1": 20, 
   "dense_l2": 20, 
   "optimizer__learning_rate": 0.1, 
   "optimizer": "Adam", 
   "loss":"binary_crossentropy", 
} 
&amp;nbsp;
  
def create_model(dense_l1, dense_l2, meta): 
  
  n_features_in_ = meta["n_features_in_"] 
  X_shape_ = meta["X_shape_"] 
  n_classes_ = meta["n_classes_"] 
&amp;nbsp;
  model = Sequential() 
  model.add(Dense(n_features_in_, input_shape=X_shape_[1:], activation="relu")) 
  model.add(Dense(dense_l1, activation="relu")) 
  model.add(Dense(dense_l2, activation="relu")) 
  model.add(Dense(1, activation="sigmoid")) 
&amp;nbsp;
  return model   
&amp;nbsp;
mlflow.sklearn.autolog() 
with mlflow.start_run(run_name="sample_run"): 
&amp;nbsp;
  classfier = KerasClassifier( 
    create_model, 
    loss=param["loss"], 
    dense_l1=param["dense_l1"], 
    dense_l2=param["dense_l2"], 
    optimizer__learning_rate = param["optimizer__learning_rate"], 
    optimizer= param["optimizer"], 
) 
&amp;nbsp;
  # fit the pipeline 
  clf = Pipeline(steps=[('preprocessor', preprocessor), 
                      ('estimator', classfier)])   
&amp;nbsp;
  h = clf.fit(X_train, y_train.values) 
  # log scores 
  acc_score = clf.score(X=X_test, y=y_test) 
  mlflow.log_metric("accuracy", acc_score) 
  signature = infer_signature(X_test, clf.predict(X_test)) 
  model_nn = ModelWrapper(clf,)  
&amp;nbsp;
  mlflow.pyfunc.log_model( 
      python_model= model_nn, 
      artifact_path = "model",  
      signature = signature,  
      conda_env = conda_env 
  ) 
&amp;nbsp;
  
&amp;nbsp;
 &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 14 Sep 2021 04:18:26 GMT</pubDate>
    <dc:creator>MGH1</dc:creator>
    <dc:date>2021-09-14T04:18:26Z</dc:date>
    <item>
      <title>how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15868#M10141</link>
      <description>&lt;P&gt;I have a set of pre-processing stages in a sklearn `Pipeline` and an estimator which is a `KerasClassifier` (`from tensorflow.keras.wrappers.scikit_learn import KerasClassifier`).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My overall goal is to tune and log the whole sklearn pipeline in `mlflow` (in databricks even). I get a confusing type error which I can't figure out how to reslove:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;gt; TypeError: can't pickle _thread.RLock objects&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have the following code (without tuning stage) which returns the above error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;conda_env = _mlflow_conda_env(
&amp;nbsp;  additional_conda_deps=None,
&amp;nbsp;  additional_pip_deps=[
&amp;nbsp;    "cloudpickle=={}".format(cloudpickle.__version__),
&amp;nbsp;    "scikit-learn=={}".format(sklearn.__version__),
&amp;nbsp;    "numpy=={}".format(np.__version__),
&amp;nbsp;    "tensorflow=={}".format(tf.__version__),
&amp;nbsp;  ],
&amp;nbsp;  additional_conda_channels=None,
&amp;nbsp;)
&amp;nbsp;
 
&amp;nbsp;
search_space = {
&amp;nbsp;  "estimator__dense_l1": 20,
&amp;nbsp;  "estimator__dense_l2": 20,
&amp;nbsp;  "estimator__learning_rate": 0.1,
&amp;nbsp;  "estimator__optimizer": "Adam",
&amp;nbsp;}
&amp;nbsp;
 &amp;nbsp;
def create_model(n):
&amp;nbsp;
&amp;nbsp;  model = Sequential()
&amp;nbsp;  model.add(Dense(int(n["estimator__dense_l1"]), activation="relu"))
&amp;nbsp;  model.add(Dense(int(n["estimator__dense_l2"]), activation="relu"))
&amp;nbsp;  model.add(Dense(1, activation="sigmoid"))
&amp;nbsp;  model.compile(
&amp;nbsp;    loss="binary_crossentropy",
&amp;nbsp;    optimizer=n["estimator__optimizer"],
&amp;nbsp;    metrics=["accuracy"],
&amp;nbsp;  )
&amp;nbsp;&amp;nbsp;
  return model
&amp;nbsp;
 
&amp;nbsp;
 
&amp;nbsp;
mlflow.sklearn.autolog()
&amp;nbsp;
with mlflow.start_run(nested=True) as run:
&amp;nbsp;
  classfier = KerasClassifier(build_fn=create_model, n=search_space)
&amp;nbsp;  # fit the pipeline
&amp;nbsp;  clf = Pipeline(steps=[("preprocessor", preprocessor), 
&amp;nbsp;
             ("estimator", classfier)])
&amp;nbsp;  h = clf.fit(
&amp;nbsp;    X_train,
&amp;nbsp;    y_train.values,
&amp;nbsp;    estimator__validation_split=0.2,
&amp;nbsp;    estimator__epochs=10,
&amp;nbsp;    estimator__verbose=2,
&amp;nbsp;  )
&amp;nbsp;
&amp;nbsp;
  # log scores
&amp;nbsp;  acc_score = clf.score(X=X_test, y=y_test)
&amp;nbsp;  mlflow.log_metric("accuracy", acc_score)
&amp;nbsp;
  signature = infer_signature(X_test, clf.predict(X_test))
&amp;nbsp;  # Log the model with a signature that defines the schema of the model's inputs and outputs.
&amp;nbsp;  mlflow.sklearn.log_model(
&amp;nbsp;    sk_model=clf, artifact_path="model", 
&amp;nbsp;    signature=signature, 
&amp;nbsp;    conda_env=conda_env
&amp;nbsp;  )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also get this warning before the error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;WARNING mlflow.sklearn.utils: Truncated the value of the key `steps`. Truncated value: `[('preprocessor', ColumnTransformer(n_jobs=None, remainder='drop', sparse_threshold=0.3,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;transformer_weights=None,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;transformers=[('num',&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Pipeline(memory=None,&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;note the whole pipeline runs outside mlflow.&lt;/P&gt;&lt;P&gt;can someone help?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Sep 2021 03:52:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15868#M10141</guid>
      <dc:creator>MGH1</dc:creator>
      <dc:date>2021-09-10T03:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15869#M10142</link>
      <description>&lt;P&gt;no one?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Sep 2021 07:10:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15869#M10142</guid>
      <dc:creator>MGH1</dc:creator>
      <dc:date>2021-09-10T07:10:10Z</dc:date>
    </item>
    <item>
      <title>Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15870#M10143</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp;- Can you jump in here? &lt;/P&gt;</description>
      <pubDate>Fri, 10 Sep 2021 14:11:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15870#M10143</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-09-10T14:11:36Z</dc:date>
    </item>
    <item>
      <title>Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15873#M10146</link>
      <description>&lt;P&gt;Thanks @Kaniz Fatma​&amp;nbsp;!&lt;/P&gt;&lt;P&gt;Just to clarify I have no issue logging a sklearn model and pipeline, for example if I replace this part of the above code from:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;   clf = Pipeline(steps=[("preprocessor", preprocessor), 
                                                ("estimator", classfier)])&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;to:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;   clf = Pipeline(steps=[("preprocessor", preprocessor), 
 
             ("estimator", RandomForestClassifier())])&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;it works without issue.&lt;/P&gt;&lt;P&gt;The problem is when you wrap a Keras model .&lt;/P&gt;</description>
      <pubDate>Sat, 11 Sep 2021 07:44:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15873#M10146</guid>
      <dc:creator>MGH1</dc:creator>
      <dc:date>2021-09-11T07:44:33Z</dc:date>
    </item>
    <item>
      <title>Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15875#M10148</link>
      <description>&lt;P&gt;I think I find sort of a workaround, but I think this issue needs to be addressed anyways.&lt;/P&gt;&lt;P&gt;What I did is not the best way.&lt;/P&gt;&lt;P&gt;I used a python package called &lt;A href="https://scikeras.readthedocs.io/en/latest/" alt="https://scikeras.readthedocs.io/en/latest/" target="_blank"&gt;scikeras&lt;/A&gt; that does this wrapping and then could log the model &lt;/P&gt;&lt;P&gt;The code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import scikeras 
import tensorflow as tf 
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Input, Dense, Dropout, LSTM, Flatten, Activation 
&amp;nbsp;
from scikeras.wrappers import KerasClassifier 
  
&amp;nbsp;
class ModelWrapper(mlflow.pyfunc.PythonModel): 
    def __init__(self, model): 
        self.model = model 
&amp;nbsp;
    def predict(self, context, model_input): 
        return self.model.predict(model_input) 
&amp;nbsp;
conda_env =  _mlflow_conda_env( 
      additional_conda_deps=None, 
      additional_pip_deps=[ 
        "cloudpickle=={}".format(cloudpickle.__version__),  
        "scikit-learn=={}".format(sklearn.__version__), 
        "numpy=={}".format(np.__version__), 
        "tensorflow=={}".format(tf.__version__), 
        "scikeras=={}".format(scikeras.__version__), 
      ], 
      additional_conda_channels=None, 
  ) 
&amp;nbsp;
param = { 
   "dense_l1": 20, 
   "dense_l2": 20, 
   "optimizer__learning_rate": 0.1, 
   "optimizer": "Adam", 
   "loss":"binary_crossentropy", 
} 
&amp;nbsp;
  
def create_model(dense_l1, dense_l2, meta): 
  
  n_features_in_ = meta["n_features_in_"] 
  X_shape_ = meta["X_shape_"] 
  n_classes_ = meta["n_classes_"] 
&amp;nbsp;
  model = Sequential() 
  model.add(Dense(n_features_in_, input_shape=X_shape_[1:], activation="relu")) 
  model.add(Dense(dense_l1, activation="relu")) 
  model.add(Dense(dense_l2, activation="relu")) 
  model.add(Dense(1, activation="sigmoid")) 
&amp;nbsp;
  return model   
&amp;nbsp;
mlflow.sklearn.autolog() 
with mlflow.start_run(run_name="sample_run"): 
&amp;nbsp;
  classfier = KerasClassifier( 
    create_model, 
    loss=param["loss"], 
    dense_l1=param["dense_l1"], 
    dense_l2=param["dense_l2"], 
    optimizer__learning_rate = param["optimizer__learning_rate"], 
    optimizer= param["optimizer"], 
) 
&amp;nbsp;
  # fit the pipeline 
  clf = Pipeline(steps=[('preprocessor', preprocessor), 
                      ('estimator', classfier)])   
&amp;nbsp;
  h = clf.fit(X_train, y_train.values) 
  # log scores 
  acc_score = clf.score(X=X_test, y=y_test) 
  mlflow.log_metric("accuracy", acc_score) 
  signature = infer_signature(X_test, clf.predict(X_test)) 
  model_nn = ModelWrapper(clf,)  
&amp;nbsp;
  mlflow.pyfunc.log_model( 
      python_model= model_nn, 
      artifact_path = "model",  
      signature = signature,  
      conda_env = conda_env 
  ) 
&amp;nbsp;
  
&amp;nbsp;
 &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Sep 2021 04:18:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15875#M10148</guid>
      <dc:creator>MGH1</dc:creator>
      <dc:date>2021-09-14T04:18:26Z</dc:date>
    </item>
    <item>
      <title>Re: how to log the KerasClassifier model in a sklearn pipeline in mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15876#M10149</link>
      <description>&lt;P&gt;could you please share the full error stack trace?&lt;/P&gt;</description>
      <pubDate>Sun, 03 Oct 2021 02:02:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-log-the-kerasclassifier-model-in-a-sklearn-pipeline-in/m-p/15876#M10149</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2021-10-03T02:02:02Z</dc:date>
    </item>
  </channel>
</rss>

