Good question! I'll divide my suggestions into 2 parts:
(1) In terms of MLflow Tracking, clustering is pretty similar to other ML workflows, so not much changes.
(2) In terms of specific parameters, metrics, etc. to track, clustering is very different, so being aware of common and useful things to track is helpful.
For (1), the generic pieces of an ML workflow should be tracked in the same way as for classification, regression, and other problems:
- Params, especially whatever hyperparameters you changed from defaults
- Metrics (see below)
- Data source and version
- Code / notebook
- etc.
For (2), I'll list some recommendations I have for important params, metrics, etc., but I'll be interested to hear from others, especially if you have links to more detailed resources.
The "right" metrics to use can be very problem-dependent and model-dependent. At a high level, I'd make sure to log:
Hope this helps!