cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is the best practice for applying MLFlow to clustering algorithms?

User16826993440
New Contributor III

What is the best practice for applying MLFlow to clustering algorithms? What are the kinds of metrics customers track?

1 REPLY 1

Joseph_B
New Contributor III
New Contributor III

Good question! I'll divide my suggestions into 2 parts:

(1) In terms of MLflow Tracking, clustering is pretty similar to other ML workflows, so not much changes.

(2) In terms of specific parameters, metrics, etc. to track, clustering is very different, so being aware of common and useful things to track is helpful.

For (1), the generic pieces of an ML workflow should be tracked in the same way as for classification, regression, and other problems:

  • Params, especially whatever hyperparameters you changed from defaults
  • Metrics (see below)
  • Data source and version
  • Code / notebook
  • etc.

For (2), I'll list some recommendations I have for important params, metrics, etc., but I'll be interested to hear from others, especially if you have links to more detailed resources.

The "right" metrics to use can be very problem-dependent and model-dependent. At a high level, I'd make sure to log:

Hope this helps!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.