cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLflow model serving: KeyError: 'python_function'

matebreeze
New Contributor

Hello,

I am training a logistic regression on text with the help of an tf-idf vectorizer.

This is done with MLflow and sklearn in databricks.

The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.

The MLflow code that creates the model:

with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")

In the models tab it is only possible to serve the logistic regression without an issue.

However, for serving the tfidf vectorizer there arises the following issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'python_function'

Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.

logistic regression:

artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

tfidf:

artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'

Question:

  • Why is the tfidf model file structured differently / why does it lack python_function?
  • Is it possible to edit these model files manually, such that I can add the key python_function?

Thanks a lot for your help in advance,

best,

matebreeze

0 REPLIES 0
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!