Cannot log SparkML model to Unity Catalog due to missing output signature

migq2
New Contributor III

I am training Spark ML model (concretely a SynapseML LightGBM ) in Databricks using mlflow and autolog

When I try to register my model in Unity catalog I get the following error: 

 

MlflowException: Model passed for registration contained a signature that includes only inputs. All models in the Unity Catalog must be logged with a model signature containing both input and output type specifications

 

After some research I found mlflow autologger correctly infers my model input signature but leaves the model output empty, which is needed for registering the model in UC.

I was able to circumvent this by using the following code to set my signature manually:

 

 

from mlflow.models import ModelSignature
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model"
model_info = mlflow.models.get_model_info(model_uri)

signature_dict = model_info.signature.to_dict()
signature_dict["outputs"] =  '[{"type": "double", "name": "prediction", "required": false}]'

new_signature = ModelSignature.from_dict(signature_dict)
mlflow.models.set_signature(model_uri, new_signature)

 

This seems to work but feels hacky and too manual. Is there a way to make mlflow autologger correctly infer and register the model output signature and avoid this additional manual signature setup?

Has anyone found a more elegant solution?