<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parallelization in training machine learning models using MLFlow in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27486#M1567</link>
    <description>&lt;P&gt;You can set num_workers to your default parallelism&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://databricks.github.io/spark-deep-learning/_modules/sparkdl/xgboost/xgboost.html" target="test_blank"&gt;https://databricks.github.io/spark-deep-learning/_modules/sparkdl/xgboost/xgboost.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 13 Oct 2022 17:58:54 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-10-13T17:58:54Z</dc:date>
    <item>
      <title>Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27483#M1564</link>
      <description>&lt;P&gt;I'm training a ML model (e.g., XGboost) and I have a large combination of 5 hyperparameters, say each parameter has 5 candidates, it will be 5^5 = 3,125 combos.&lt;/P&gt;&lt;P&gt;Now I want to do parallelization for the grid search on all the hyperparameter combos for training a machine learning model to get the best performance of the model.&lt;/P&gt;&lt;P&gt;So how can I achieve this on Databricks, especially using MLFlow? I've been told I can define a function to train and evaluate the model (using mlflow) and defining an array with all of the hyper-parameter combinations, sc.parallelize the array and then mapping the function over.&lt;/P&gt;&lt;P&gt;I have come up with the code for the sc.parallelize the array, like&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;paras_combo_test =  [(x, y) for x in [50, 100, 150] for y in [0.8,0.9,0.95]]
sc.parallelize(paras_combo_test, 3).glom().collect()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(for simplicit, I'm just using two parameters x, y and there are 9 combos in total and I divided them to 3 partitions.)&lt;/P&gt;&lt;P&gt;How can I map over the function which does the model training with evaluation (probably using mlflow), so that there will be 3 works (each work will train 3 models) in parallel from the partitions of parameter combos I have?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 14:55:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27483#M1564</guid>
      <dc:creator>ianchenmu</dc:creator>
      <dc:date>2022-10-13T14:55:19Z</dc:date>
    </item>
    <item>
      <title>Re: Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27484#M1565</link>
      <description>&lt;P&gt;This blog should be very helpful: &lt;/P&gt;&lt;P&gt;&lt;A href="https://www.databricks.com/blog/2021/04/15/how-not-to-tune-your-model-with-hyperopt.html" alt="https://www.databricks.com/blog/2021/04/15/how-not-to-tune-your-model-with-hyperopt.html" target="_blank"&gt;https://www.databricks.com/blog/2021/04/15/how-not-to-tune-your-model-with-hyperopt.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Here are the docs on xgboost&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/machine-learning/train-model/xgboost.html" target="test_blank"&gt;https://docs.databricks.com/machine-learning/train-model/xgboost.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;A simple rule is never use sc.parallelize.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 15:15:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27484#M1565</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-10-13T15:15:53Z</dc:date>
    </item>
    <item>
      <title>Re: Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27485#M1566</link>
      <description>&lt;P&gt;Thanks @Joseph Kambourakis​&amp;nbsp;! It seems we could do the distributed XGBoost training using the num_workers regards to how many workers in the cluster. But can we also speed up by setting a parameter utilizing the number of cores in the cluster?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 17:00:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27485#M1566</guid>
      <dc:creator>ianchenmu</dc:creator>
      <dc:date>2022-10-13T17:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27486#M1567</link>
      <description>&lt;P&gt;You can set num_workers to your default parallelism&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://databricks.github.io/spark-deep-learning/_modules/sparkdl/xgboost/xgboost.html" target="test_blank"&gt;https://databricks.github.io/spark-deep-learning/_modules/sparkdl/xgboost/xgboost.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 17:58:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27486#M1567</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-10-13T17:58:54Z</dc:date>
    </item>
    <item>
      <title>Re: Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27487#M1568</link>
      <description>&lt;P&gt;collect() is working on the driver and will not offer any parallelism but rather OOM error. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2022 13:35:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27487#M1568</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-18T13:35:16Z</dc:date>
    </item>
    <item>
      <title>Re: Parallelization in training machine learning models using MLFlow</title>
      <link>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27488#M1569</link>
      <description>&lt;P&gt;Hi @Chen Mu​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or&lt;B&gt; mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Nov 2022 07:45:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/parallelization-in-training-machine-learning-models-using-mlflow/m-p/27488#M1569</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-21T07:45:45Z</dc:date>
    </item>
  </channel>
</rss>

