<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the best practice for applying MLFlow to clustering algorithms? in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/92990#M3715</link>
    <description>&lt;P&gt;Does it make sense to register a Kmeans clustering model once the experiment has been tracked and you are satisfied with the outcome? If so, how do you do it?&lt;/P&gt;</description>
    <pubDate>Mon, 07 Oct 2024 17:30:31 GMT</pubDate>
    <dc:creator>wallco26</dc:creator>
    <dc:date>2024-10-07T17:30:31Z</dc:date>
    <item>
      <title>What is the best practice for applying MLFlow to clustering algorithms?</title>
      <link>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/25597#M1411</link>
      <description>&lt;P&gt;What is the best practice for applying MLFlow to clustering algorithms? What are the kinds of metrics customers track?&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jun 2021 16:42:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/25597#M1411</guid>
      <dc:creator>User16826993440</dc:creator>
      <dc:date>2021-06-08T16:42:39Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best practice for applying MLFlow to clustering algorithms?</title>
      <link>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/25598#M1412</link>
      <description>&lt;P&gt;Good question!  I'll divide my suggestions into 2 parts:&lt;/P&gt;&lt;P&gt;(1) In terms of MLflow Tracking, clustering is pretty similar to other ML workflows, so not much changes.&lt;/P&gt;&lt;P&gt;(2) In terms of specific parameters, metrics, etc. to track, clustering is very different, so being aware of common and useful things to track is helpful.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For (1), the generic pieces of an ML workflow should be tracked in the same way as for classification, regression, and other problems:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Params, especially whatever hyperparameters you changed from defaults&lt;/LI&gt;&lt;LI&gt;Metrics (see below)&lt;/LI&gt;&lt;LI&gt;Data source and version&lt;/LI&gt;&lt;LI&gt;Code / notebook&lt;/LI&gt;&lt;LI&gt;etc.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;For (2)&lt;/B&gt;, I'll list some recommendations I have for important params, metrics, etc., but I'll be interested to hear from others, especially if you have links to more detailed resources.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The "right" metrics to use can be very problem-dependent and model-dependent.  At a high level, I'd make sure to log:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The metric your algorithm is optimizing: For example, K-means optimizes for Euclidean distance.  The scikit-learn documentation has a great list of metrics ("geometry") for models it supports: &lt;A href="https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods" target="test_blank"&gt;https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;The metric you care most about: For example, if you know ground-truth assignments, you might use the Rand index.  If you don't have ground-truth, you might use the Silhouette coefficient.  The scikit-learn documentation has lengthy explanations of some clustering metrics: &lt;A href="https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation" target="test_blank"&gt;https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation&lt;/A&gt;  The Wikipedia page is good too: &lt;A href="https://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_and_assessment" target="test_blank"&gt;https://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_and_assessment&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;(Both of the above, for both training and validation data)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 21:34:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/25598#M1412</guid>
      <dc:creator>Joseph_B</dc:creator>
      <dc:date>2021-06-18T21:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best practice for applying MLFlow to clustering algorithms?</title>
      <link>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/92990#M3715</link>
      <description>&lt;P&gt;Does it make sense to register a Kmeans clustering model once the experiment has been tracked and you are satisfied with the outcome? If so, how do you do it?&lt;/P&gt;</description>
      <pubDate>Mon, 07 Oct 2024 17:30:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/what-is-the-best-practice-for-applying-mlflow-to-clustering/m-p/92990#M3715</guid>
      <dc:creator>wallco26</dc:creator>
      <dc:date>2024-10-07T17:30:31Z</dc:date>
    </item>
  </channel>
</rss>

