I am training Spark ML model (concretely a SynapseML LightGBM ) in Databricks using mlflow and autolog
When I try to register my model in Unity catalog I get the following error:
MlflowException: Model passed for registration contained a signature that includes only inputs. All models in the Unity Catalog must be logged with a model signature containing both input and output type specifications
After some research I found mlflow autologger correctly infers my model input signature but leaves the model output empty, which is needed for registering the model in UC.
I was able to circumvent this by using the following code to set my signature manually:
from mlflow.models import ModelSignature
model_uri=f"runs:/{mlflow.active_run().info.run_id}/model"
model_info = mlflow.models.get_model_info(model_uri)
signature_dict = model_info.signature.to_dict()
signature_dict["outputs"] = '[{"type": "double", "name": "prediction", "required": false}]'
new_signature = ModelSignature.from_dict(signature_dict)
mlflow.models.set_signature(model_uri, new_signature)
This seems to work but feels hacky and too manual. Is there a way to make mlflow autologger correctly infer and register the model output signature and avoid this additional manual signature setup?
Has anyone found a more elegant solution?