<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: XGBoost Feature Weighting in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102590#M3863</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136903"&gt;@sjohnston2&lt;/a&gt;&amp;nbsp;here is some information i found internally:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2 class="mb-2 mt-6 text-lg first:mt-3"&gt;Possible Causes&lt;/H2&gt;
&lt;OL class="marker:text-textOff list-decimal pl-8"&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Memory Access Issue&lt;/STRONG&gt;: The segmentation fault suggests that the program is trying to access memory that it's not allowed to, which could be caused by an internal bug in XGBoost when processing certain feature weight configurations&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;XGBoost Version&lt;/STRONG&gt;: This could be a bug in the specific version of XGBoost you're using. Feature weights were added in version 1.3.0, so ensure you're using a recent, stable version&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Incompatible Feature Weights&lt;/STRONG&gt;: The error occurs with certain feature weight configurations but not others, indicating that the issue might be related to how XGBoost handles specific weight patterns.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;Try modifying your feature weights to avoid the configuration that causes the error. For example:&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-textMainDark selection:!text-superDark selection:bg-superDuper/10 bg-offset dark:bg-offsetDark my-md relative flex flex-col rounded font-mono text-sm font-thin"&gt;
&lt;DIV class="top-headerHeight translate-y-xs -translate-x-xs bottom-xl mb-xl sticky flex h-0 items-start justify-end"&gt;
&lt;DIV class="flex items-center min-w-0 justify-center gap-xs"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV class="pr-lg"&gt;&lt;LI-CODE lang="markup"&gt;feature_weights = np.ones(X_train.shape[1])  # Start with all weights set to 1
feature_weights[:10] = 2.0  # Increase weights for the first 10 features
&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 18 Dec 2024 21:48:20 GMT</pubDate>
    <dc:creator>Walter_C</dc:creator>
    <dc:date>2024-12-18T21:48:20Z</dc:date>
    <item>
      <title>XGBoost Feature Weighting</title>
      <link>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102587#M3862</link>
      <description>&lt;P&gt;We are trying to train a predictive ML model using the XGBoost Classifier. Part of the requirements we have gotten from our business team is to implement feature weighting as they have defined certain features mattering more than others. We have 69 features as part of the dataset.&lt;/P&gt;&lt;P&gt;We are trying to fit the model with these parameters:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;model.&lt;/SPAN&gt;&lt;SPAN&gt;fit&lt;/SPAN&gt;&lt;SPAN&gt;(X_train,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; y_train,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;classifier__feature_weights&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_weights, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;classifier__early_stopping_rounds&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;5&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;classifier__verbose&lt;/SPAN&gt;&lt;SPAN&gt;=False&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;classifier__eval_set&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[(X_val_processed,y_val_processed)])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_weights is set accordingly to test:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_weights &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; np.&lt;/SPAN&gt;&lt;SPAN&gt;zeros&lt;/SPAN&gt;&lt;SPAN&gt;(X_train.shape[&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_weights[:&lt;/SPAN&gt;&lt;SPAN&gt;10&lt;/SPAN&gt;&lt;SPAN&gt;] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;2.0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;When running this, we are getting the following error:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The Python process exited with exit code 139 (SIGSEGV: Segmentation fault).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;However, when we run feature_weights set to this, we don't get an error:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_weights &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; np.&lt;/SPAN&gt;&lt;SPAN&gt;zeros&lt;/SPAN&gt;&lt;SPAN&gt;(X_train.shape[&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_weights[:5&lt;/SPAN&gt;&lt;SPAN&gt;] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&amp;nbsp;1&lt;SPAN&gt;.0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Do you have any insight or advice on this error and how we can fix it moving forward? Our research tells us it's a memory issue, but looking at the cluster metrics shows us that 90GB/220GB of memory is being used.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 18 Dec 2024 21:27:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102587#M3862</guid>
      <dc:creator>sjohnston2</dc:creator>
      <dc:date>2024-12-18T21:27:07Z</dc:date>
    </item>
    <item>
      <title>Re: XGBoost Feature Weighting</title>
      <link>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102590#M3863</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136903"&gt;@sjohnston2&lt;/a&gt;&amp;nbsp;here is some information i found internally:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2 class="mb-2 mt-6 text-lg first:mt-3"&gt;Possible Causes&lt;/H2&gt;
&lt;OL class="marker:text-textOff list-decimal pl-8"&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Memory Access Issue&lt;/STRONG&gt;: The segmentation fault suggests that the program is trying to access memory that it's not allowed to, which could be caused by an internal bug in XGBoost when processing certain feature weight configurations&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;XGBoost Version&lt;/STRONG&gt;: This could be a bug in the specific version of XGBoost you're using. Feature weights were added in version 1.3.0, so ensure you're using a recent, stable version&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Incompatible Feature Weights&lt;/STRONG&gt;: The error occurs with certain feature weight configurations but not others, indicating that the issue might be related to how XGBoost handles specific weight patterns.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;Try modifying your feature weights to avoid the configuration that causes the error. For example:&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-textMainDark selection:!text-superDark selection:bg-superDuper/10 bg-offset dark:bg-offsetDark my-md relative flex flex-col rounded font-mono text-sm font-thin"&gt;
&lt;DIV class="top-headerHeight translate-y-xs -translate-x-xs bottom-xl mb-xl sticky flex h-0 items-start justify-end"&gt;
&lt;DIV class="flex items-center min-w-0 justify-center gap-xs"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV class="pr-lg"&gt;&lt;LI-CODE lang="markup"&gt;feature_weights = np.ones(X_train.shape[1])  # Start with all weights set to 1
feature_weights[:10] = 2.0  # Increase weights for the first 10 features
&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 21:48:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102590#M3863</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-18T21:48:20Z</dc:date>
    </item>
    <item>
      <title>Re: XGBoost Feature Weighting</title>
      <link>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102676#M3867</link>
      <description>&lt;P&gt;Thanks for the response, Walter!&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seemed like the XGBoost version is what was causing us the issue. Upgrading the version and rerunning our previous tests worked perfectly. Thank you so much for the help and have a wonderful holiday!&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 15:35:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/xgboost-feature-weighting/m-p/102676#M3867</guid>
      <dc:creator>sjohnston2</dc:creator>
      <dc:date>2024-12-19T15:35:58Z</dc:date>
    </item>
  </channel>
</rss>

