<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unable to use GridsearchCV from spark-sklearn due to  'fit_params' error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-use-gridsearchcv-from-spark-sklearn-due-to-fit-params/m-p/6047#M2297</link>
    <description>&lt;P&gt;When using GridsearchCV from spark-sklearn, I got &lt;A href="https://stackoverflow.com/questions/70921572/gridsearchcv-giving-init-got-an-unexpected-keyword-argument-fit-params" alt="https://stackoverflow.com/questions/70921572/gridsearchcv-giving-init-got-an-unexpected-keyword-argument-fit-params" target="_blank"&gt;GridSearchCV giving " __init__() got an unexpected keyword argument 'fit_params' error&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using sklearn 1.2.2 and spark-sklearn 0.3.0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think this is because spark-sklearn GridsearchCV still has the &lt;I&gt;fit_params&lt;/I&gt; parameter which is deprecated in GridsearchCV in sklearn 1.2.2.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to use Databrick's parallel processing to help me tune hyper parameter using this example: &lt;A href="https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation" alt="https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation" target="_blank"&gt;https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What would be a good solution ?&lt;/P&gt;</description>
    <pubDate>Tue, 11 Apr 2023 18:08:07 GMT</pubDate>
    <dc:creator>gary7135</dc:creator>
    <dc:date>2023-04-11T18:08:07Z</dc:date>
    <item>
      <title>Unable to use GridsearchCV from spark-sklearn due to  'fit_params' error</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-use-gridsearchcv-from-spark-sklearn-due-to-fit-params/m-p/6047#M2297</link>
      <description>&lt;P&gt;When using GridsearchCV from spark-sklearn, I got &lt;A href="https://stackoverflow.com/questions/70921572/gridsearchcv-giving-init-got-an-unexpected-keyword-argument-fit-params" alt="https://stackoverflow.com/questions/70921572/gridsearchcv-giving-init-got-an-unexpected-keyword-argument-fit-params" target="_blank"&gt;GridSearchCV giving " __init__() got an unexpected keyword argument 'fit_params' error&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using sklearn 1.2.2 and spark-sklearn 0.3.0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think this is because spark-sklearn GridsearchCV still has the &lt;I&gt;fit_params&lt;/I&gt; parameter which is deprecated in GridsearchCV in sklearn 1.2.2.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to use Databrick's parallel processing to help me tune hyper parameter using this example: &lt;A href="https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation" alt="https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation" target="_blank"&gt;https://kb.databricks.com/en_US/machine-learning/kfold-cross-validation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What would be a good solution ?&lt;/P&gt;</description>
      <pubDate>Tue, 11 Apr 2023 18:08:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-use-gridsearchcv-from-spark-sklearn-due-to-fit-params/m-p/6047#M2297</guid>
      <dc:creator>gary7135</dc:creator>
      <dc:date>2023-04-11T18:08:07Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to use GridsearchCV from spark-sklearn due to  'fit_params' error</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-use-gridsearchcv-from-spark-sklearn-due-to-fit-params/m-p/6048#M2298</link>
      <description>&lt;P&gt;@Gary Mu​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, you are correct. The error message you are seeing is likely due to the fact that the fit_params parameter was deprecated in GridSearchCV in sklearn 1.2.2.&lt;/P&gt;&lt;P&gt;  One possible solution is to use a different version of scikit-learn that is compatible with spark-sklearn 0.3.0 and still has the fit_params parameter. You can try downgrading scikit-learn to a version prior to 1.2.2, such as 1.1.0, which should be compatible with spark-sklearn 0.3.0 and still have the fit_params parameter. &lt;/P&gt;&lt;P&gt;    Alternatively, you could modify the GridSearchCV implementation in spark-sklearn to remove the fit_params parameter. You can do this by forking the spark-sklearn repository, modifying the code, and building a new version of the package. Another option is to use Hyperopt or Optuna for hyperparameter tuning instead of GridSearchCV . These libraries are designed for distributed computing and can be used with Databricks.&lt;/P&gt;</description>
      <pubDate>Sun, 16 Apr 2023 00:59:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-use-gridsearchcv-from-spark-sklearn-due-to-fit-params/m-p/6048#M2298</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-16T00:59:30Z</dc:date>
    </item>
  </channel>
</rss>

