<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MLflow model serving: KeyError: 'python_function' in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/mlflow-model-serving-keyerror-python-function/m-p/33363#M1765</link>
    <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am training a logistic regression on text with the help of an tf-idf vectorizer.&lt;/P&gt;&lt;P&gt;This is done with MLflow and sklearn in databricks.&lt;/P&gt;&lt;P&gt;The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The MLflow code that creates the model: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;In the models tab it is only possible to serve the logistic regression without an issue. &lt;/P&gt;&lt;P&gt;However, for serving the tfidf vectorizer there arises the following issue: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Traceback (most recent call last):
  File "&amp;lt;string&amp;gt;", line 1, in &amp;lt;module&amp;gt;
KeyError: 'python_function'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;logistic regression:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;tfidf:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Question&lt;/B&gt;: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Why is the tfidf model file structured differently / why does it lack python_function?&lt;/LI&gt;&lt;LI&gt;Is it possible to edit these model files manually, such that I can add the key python_function?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot for your help in advance, &lt;/P&gt;&lt;P&gt;best, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;matebreeze&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 26 Aug 2022 15:20:34 GMT</pubDate>
    <dc:creator>matebreeze</dc:creator>
    <dc:date>2022-08-26T15:20:34Z</dc:date>
    <item>
      <title>MLflow model serving: KeyError: 'python_function'</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-model-serving-keyerror-python-function/m-p/33363#M1765</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am training a logistic regression on text with the help of an tf-idf vectorizer.&lt;/P&gt;&lt;P&gt;This is done with MLflow and sklearn in databricks.&lt;/P&gt;&lt;P&gt;The model itself is trained successfully in databricks and it is possible to accomplish predictions within the jupyter notebook on the databricks platform.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The MLflow code that creates the model: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;with mlflow.start_run(run_name='logistic_regression') as run:
  
    text_transformer = TfidfVectorizer(stop_words=['english'], ngram_range=(1, 2), lowercase=True, max_features=150000)
    
    lr = LogisticRegression(C=5e1, solver='lbfgs', multi_class='multinomial', random_state=17, n_jobs=4)
    
    text_transformer.fit(train_val['text'])
    mlflow.sklearn.log_model(text_transformer, "tfidf-model")
    
    X_train_text = text_transformer.transform(train_val['text'])
    X_test_text = text_transformer.transform(test['text'])
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=17)
    cv_results = cross_val_score(lr, X_train_text, train_val['label'], cv=skf, scoring='f1_micro')
    
    mlflow.log_param("F1_score", cv_results.mean())
    
    lr.fit(X_train_text, train_val['label'])
    mlflow.sklearn.log_model(lr, "lr-model")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;In the models tab it is only possible to serve the logistic regression without an issue. &lt;/P&gt;&lt;P&gt;However, for serving the tfidf vectorizer there arises the following issue: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Traceback (most recent call last):
  File "&amp;lt;string&amp;gt;", line 1, in &amp;lt;module&amp;gt;
KeyError: 'python_function'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Inspecting the two models under experiments, it is noticeable, that the tfidf vectorizer does not contain the attributes for the key 'python_function'.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;logistic regression:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;artifact_path: lr-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;tfidf:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;artifact_path: tfidf-model
databricks_runtime: 10.4.x-scala2.12
flavors:
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
mlflow_version: 1.28.0
model_uuid: some number
run_id: some number
utc_time_created: 'some date'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Question&lt;/B&gt;: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Why is the tfidf model file structured differently / why does it lack python_function?&lt;/LI&gt;&lt;LI&gt;Is it possible to edit these model files manually, such that I can add the key python_function?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot for your help in advance, &lt;/P&gt;&lt;P&gt;best, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;matebreeze&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2022 15:20:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-model-serving-keyerror-python-function/m-p/33363#M1765</guid>
      <dc:creator>matebreeze</dc:creator>
      <dc:date>2022-08-26T15:20:34Z</dc:date>
    </item>
  </channel>
</rss>

