<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-use-hyperopt-to-train-a-spark-ml-model/m-p/26240#M18348</link>
    <description>&lt;P&gt;It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.&lt;/P&gt;</description>
    <pubDate>Fri, 18 Jun 2021 00:00:45 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2021-06-18T00:00:45Z</dc:date>
    <item>
      <title>What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-use-hyperopt-to-train-a-spark-ml-model/m-p/26239#M18347</link>
      <description>&lt;P&gt;I've read this &lt;A href="https://docs.databricks.com/applications/machine-learning/automl-hyperparam-tuning/index.html" alt="https://docs.databricks.com/applications/machine-learning/automl-hyperparam-tuning/index.html" target="_blank"&gt;article&lt;/A&gt;, which covers:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid search&lt;/LI&gt;&lt;LI&gt;parallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)&lt;/LI&gt;&lt;LI&gt;"Distributed training with Hyperopt and HorovodRunner" - distributed deep learning with hyperopt (no MLFlow)&lt;UL&gt;&lt;LI&gt;It does mention "With HorovodRunner, you do not use the SparkTrials class, and you must manually call MLflow to log trials for Hyperopt."&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there an example notebook that shows how to hyperparameter tune a spark.ml model and log hyperparams/metrics/artifacts?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jun 2021 19:34:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-use-hyperopt-to-train-a-spark-ml-model/m-p/26239#M18347</guid>
      <dc:creator>User16752240150</dc:creator>
      <dc:date>2021-06-04T19:34:03Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-use-hyperopt-to-train-a-spark-ml-model/m-p/26240#M18348</link>
      <description>&lt;P&gt;It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 00:00:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-way-to-use-hyperopt-to-train-a-spark-ml-model/m-p/26240#M18348</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2021-06-18T00:00:45Z</dc:date>
    </item>
  </channel>
</rss>

