cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLflow model serving: KeyError: 'python_function'

matebreeze
New Contributor

Hello,

I am training a logistic regression on text with the help of an tf-idf vectorizer.

This is done with MLflow and sklearn in databricks.

The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.

The MLflow code that creates the model:

with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")

In the models tab it is only possible to serve the logistic regression without an issue.

However, for serving the tfidf vectorizer there arises the following issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'python_function'

Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.

logistic regression:

artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

tfidf:

artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

Question:

  • Why is the tfidf model file structured differently / why does it lack python_function?
  • Is it possible to edit these model files manually, such that I can add the key python_function?

Thanks a lot for your help in advance,

best,

matebreeze

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group