cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Cannot log SparkML model to Unity Catalog due to missing output signature

migq2
New Contributor III

I am training Spark ML model (concretely a SynapseML LightGBM ) in Databricks using mlflow and autolog

When I try to register my model in Unity catalog I get the following error: 

 

MlflowException: Model passed for registration contained a signature that includes only inputs. All models in the Unity Catalog must be logged with a model signature containing both input and output type specifications

 

After some research I found mlflow autologger correctly infers my model input signature but leaves the model output empty, which is needed for registering the model in UC.

I was able to circumvent this by using the following code to set my signature manually:

 

 

from mlflow.models import ModelSignature
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model"
model_info = mlflow.models.get_model_info(model_uri)

signature_dict = model_info.signature.to_dict()
signature_dict["outputs"] =  '[{"type": "double", "name": "prediction", "required": false}]'

new_signature = ModelSignature.from_dict(signature_dict)
mlflow.models.set_signature(model_uri, new_signature)

 

This seems to work but feels hacky and too manual. Is there a way to make mlflow autologger correctly infer and register the model output signature and avoid this additional manual signature setup?

Has anyone found a more elegant solution?

1 REPLY 1

migq2
New Contributor III

Hi @Retired_mod, I'm using mlflow-skinny[databricks]==2.14.3 in a Databricks cluster with DBR 13.3 LTS.

I have tried training a model with the following libraries:

  • Spark MLlib: does not log any signature at all (you can find the snippet to reproduce here)
  • SynapseML LightGBM: logs a input signature but not an output
  • scikit-learn: logs a signature with both input and output. However the output signature seems to be a Tensor based signature, which I thought was meant for Deep Learning use cases even though my example is a simple iris dataset regression model

    Here goes the sklearn example:

 

 

 

import mlflow
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

print(f"MLFLOW version is: {mlflow.__version__}\n")

mlflow.autolog(exclusive=False)

with mlflow.start_run():
    # Train a sklearn model on the iris dataset
    X, y = datasets.load_iris(return_X_y=True, as_frame=True)
    clf = RandomForestClassifier(max_depth=7)
    clf.fit(X, y)
    
    model_info = mlflow.models.get_model_info(f"runs:/{mlflow.active_run().info.run_id}/model")
    
    print("Model signature:")
    print(model_info.signature)

 

 

 

Output: 

 

 

 

MLFLOW version is: 2.14.3

Model signature:
inputs: 
  ['sepal length (cm)': double (required), 'sepal width (cm)': double (required), 'petal length (cm)': double (required), 'petal width (cm)': double (required)]
outputs: 
  [Tensor('int64', (-1,))]
params: 
  None

 

 

 

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group