<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/columntransformer-not-fitted-after-sklearn-pipeline-loaded-from/m-p/11723#M6661</link>
    <description>&lt;P&gt;I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;transformers = []
num_pipe = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])
transformers.append(('numerical', num_pipe, num_cols))
cat_pipe = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('ohe', OneHotEncoder(handle_unknown='ignore'))
])
transformers.append(('categorical', cat_pipe, cat_cols))
preprocessor = ColumnTransformer(transformers, remainder='passthrough')
model = Pipeline([
  ('prep', preprocessor),
  ('clf', XGBClassifier())
])&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I am using&amp;nbsp;Mlflow&amp;nbsp;to log the model artifact as sklearn model after it is fitted on training data.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;model.fit(X, y)
mlflow.sklearn.log_model(model, model_uri)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When I tried to load the model from mlflow for scoring though, I got the error "This ColumnTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;run_model = mlflow.sklearn.load_model(model_uri)
run_model.predict(X_pred)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I also ran check_is_fitted&amp;nbsp;on the second step of the Pipeline which is the xgboost model itself after loaded from mlflow and it is NOT fitted either.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is Mlflow not compatible with sklearn Pipeline with multiple steps? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 02 Nov 2021 20:20:19 GMT</pubDate>
    <dc:creator>Nasreddin</dc:creator>
    <dc:date>2021-11-02T20:20:19Z</dc:date>
    <item>
      <title>ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow</title>
      <link>https://community.databricks.com/t5/data-engineering/columntransformer-not-fitted-after-sklearn-pipeline-loaded-from/m-p/11723#M6661</link>
      <description>&lt;P&gt;I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;transformers = []
num_pipe = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])
transformers.append(('numerical', num_pipe, num_cols))
cat_pipe = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('ohe', OneHotEncoder(handle_unknown='ignore'))
])
transformers.append(('categorical', cat_pipe, cat_cols))
preprocessor = ColumnTransformer(transformers, remainder='passthrough')
model = Pipeline([
  ('prep', preprocessor),
  ('clf', XGBClassifier())
])&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I am using&amp;nbsp;Mlflow&amp;nbsp;to log the model artifact as sklearn model after it is fitted on training data.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;model.fit(X, y)
mlflow.sklearn.log_model(model, model_uri)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When I tried to load the model from mlflow for scoring though, I got the error "This ColumnTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;run_model = mlflow.sklearn.load_model(model_uri)
run_model.predict(X_pred)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I also ran check_is_fitted&amp;nbsp;on the second step of the Pipeline which is the xgboost model itself after loaded from mlflow and it is NOT fitted either.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is Mlflow not compatible with sklearn Pipeline with multiple steps? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 20:20:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/columntransformer-not-fitted-after-sklearn-pipeline-loaded-from/m-p/11723#M6661</guid>
      <dc:creator>Nasreddin</dc:creator>
      <dc:date>2021-11-02T20:20:19Z</dc:date>
    </item>
  </channel>
</rss>

