<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Cannot log SparkML model to Unity Catalog due to missing output signature in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/cannot-log-sparkml-model-to-unity-catalog-due-to-missing-output/m-p/78598#M3425</link>
    <description>&lt;P&gt;I am training Spark ML model (concretely a&amp;nbsp;&lt;A href="https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier" target="_self"&gt;SynapseML LightGBM&lt;/A&gt;&amp;nbsp;) in Databricks using mlflow and autolog&lt;/P&gt;&lt;P&gt;When I try to register my model in Unity catalog I get the following error:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MlflowException: Model passed for registration contained a signature that includes only inputs. All models in the Unity Catalog must be logged with a model signature containing both input and output type specifications&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some research I found mlflow autologger correctly infers my model input signature but leaves the model output empty, which is needed for registering the model in UC.&lt;/P&gt;&lt;P&gt;I was able to circumvent this by using the following code to set my signature manually:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from mlflow.models import ModelSignature
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model"
model_info = mlflow.models.get_model_info(model_uri)

signature_dict = model_info.signature.to_dict()
signature_dict["outputs"] =  '[{"type": "double", "name": "prediction", "required": false}]'

new_signature = ModelSignature.from_dict(signature_dict)
mlflow.models.set_signature(model_uri, new_signature)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This seems to work but feels hacky and too manual. Is there a way to make mlflow autologger correctly infer and register the model output signature and avoid this additional manual signature setup?&lt;/P&gt;&lt;P&gt;Has anyone found a more elegant solution?&lt;/P&gt;</description>
    <pubDate>Fri, 12 Jul 2024 19:06:06 GMT</pubDate>
    <dc:creator>migq2</dc:creator>
    <dc:date>2024-07-12T19:06:06Z</dc:date>
    <item>
      <title>Cannot log SparkML model to Unity Catalog due to missing output signature</title>
      <link>https://community.databricks.com/t5/machine-learning/cannot-log-sparkml-model-to-unity-catalog-due-to-missing-output/m-p/78598#M3425</link>
      <description>&lt;P&gt;I am training Spark ML model (concretely a&amp;nbsp;&lt;A href="https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier" target="_self"&gt;SynapseML LightGBM&lt;/A&gt;&amp;nbsp;) in Databricks using mlflow and autolog&lt;/P&gt;&lt;P&gt;When I try to register my model in Unity catalog I get the following error:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MlflowException: Model passed for registration contained a signature that includes only inputs. All models in the Unity Catalog must be logged with a model signature containing both input and output type specifications&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some research I found mlflow autologger correctly infers my model input signature but leaves the model output empty, which is needed for registering the model in UC.&lt;/P&gt;&lt;P&gt;I was able to circumvent this by using the following code to set my signature manually:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from mlflow.models import ModelSignature
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model"
model_info = mlflow.models.get_model_info(model_uri)

signature_dict = model_info.signature.to_dict()
signature_dict["outputs"] =  '[{"type": "double", "name": "prediction", "required": false}]'

new_signature = ModelSignature.from_dict(signature_dict)
mlflow.models.set_signature(model_uri, new_signature)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This seems to work but feels hacky and too manual. Is there a way to make mlflow autologger correctly infer and register the model output signature and avoid this additional manual signature setup?&lt;/P&gt;&lt;P&gt;Has anyone found a more elegant solution?&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jul 2024 19:06:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/cannot-log-sparkml-model-to-unity-catalog-due-to-missing-output/m-p/78598#M3425</guid>
      <dc:creator>migq2</dc:creator>
      <dc:date>2024-07-12T19:06:06Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot log SparkML model to Unity Catalog due to missing output signature</title>
      <link>https://community.databricks.com/t5/machine-learning/cannot-log-sparkml-model-to-unity-catalog-due-to-missing-output/m-p/78862#M3432</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, I'm using&amp;nbsp;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;mlflow&lt;/SPAN&gt;&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt;skinny[databricks]&lt;/SPAN&gt;&lt;SPAN&gt;==&lt;/SPAN&gt;&lt;SPAN&gt;2.14&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;.3&lt;/FONT&gt; in a Databricks cluster with DBR 13.3 LTS.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I have tried training a model with the following libraries:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://spark.apache.org/mllib/" target="_self"&gt;Spark MLlib&lt;/A&gt;: does not log any signature at all (you can find the snippet to reproduce&amp;nbsp;&lt;A href="https://github.com/mlflow/mlflow/issues/12661" target="_self"&gt;here&lt;/A&gt;)&lt;/LI&gt;&lt;LI&gt;&lt;A href="https://mmlspark.blob.core.windows.net/docs/0.9.5/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier" target="_self" rel="nofollow noopener noreferrer"&gt;SynapseML LightGBM&lt;/A&gt;: logs a input signature but not an output&lt;/LI&gt;&lt;LI&gt;&lt;A href="https://scikit-learn.org/stable/" target="_self"&gt;scikit-learn&lt;/A&gt;: logs a signature with both input and output. However the output signature seems to be a&amp;nbsp;&lt;A href="https://mlflow.org/docs/latest/model/signatures.html#tensor-based-signatures" target="_self"&gt;Tensor based signature&lt;/A&gt;, which I thought was meant for Deep Learning use cases even though my example is a simple iris dataset regression model&lt;BR /&gt;&lt;BR /&gt;Here goes the sklearn example:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import mlflow
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

print(f"MLFLOW version is: {mlflow.__version__}\n")

mlflow.autolog(exclusive=False)

with mlflow.start_run():
    # Train a sklearn model on the iris dataset
    X, y = datasets.load_iris(return_X_y=True, as_frame=True)
    clf = RandomForestClassifier(max_depth=7)
    clf.fit(X, y)
    
    model_info = mlflow.models.get_model_info(f"runs:/{mlflow.active_run().info.run_id}/model")
    
    print("Model signature:")
    print(model_info.signature)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Output:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MLFLOW version is: 2.14.3

Model signature:
inputs: 
  ['sepal length (cm)': double (required), 'sepal width (cm)': double (required), 'petal length (cm)': double (required), 'petal width (cm)': double (required)]
outputs: 
  [Tensor('int64', (-1,))]
params: 
  None&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2024 18:26:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/cannot-log-sparkml-model-to-unity-catalog-due-to-missing-output/m-p/78862#M3432</guid>
      <dc:creator>migq2</dc:creator>
      <dc:date>2024-07-15T18:26:12Z</dc:date>
    </item>
  </channel>
</rss>

