MLflow model serving: KeyError: 'python_function'

matebreeze — Fri, 26 Aug 2022 15:20:34 GMT

Hello,

I am training a logistic regression on text with the help of an tf-idf vectorizer.

This is done with MLflow and sklearn in databricks.

The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.

The MLflow code that creates the model:

with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")

In the models tab it is only possible to serve the logistic regression without an issue.

However, for serving the tfidf vectorizer there arises the following issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'python_function'

Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.

logistic regression:

artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

tfidf:

artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

Question:

Why is the tfidf model file structured differently / why does it lack python_function?
Is it possible to edit these model files manually, such that I can add the key python_function?

Thanks a lot for your help in advance,

best,

matebreeze

topic MLflow model serving: KeyError: 'python_function' in Machine Learning

MLflow model serving: KeyError: 'python_function'