cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

MLflow model serving: KeyError: 'python_function'

matebreeze
New Contributor

Hello,

I am training a logistic regression on text with the help of an tf-idf vectorizer.

This is done with MLflow and sklearn in databricks.

The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.

The MLflow code that creates the model:

with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")

In the models tab it is only possible to serve the logistic regression without an issue.

However, for serving the tfidf vectorizer there arises the following issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'python_function'

Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.

logistic regression:

artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

tfidf:

artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

Question:

  • Why is the tfidf model file structured differently / why does it lack python_function?
  • Is it possible to edit these model files manually, such that I can add the key python_function?

Thanks a lot for your help in advance,

best,

matebreeze

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.