<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AutoML master notebook failing in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/111049#M3975</link>
    <description>&lt;P data-unlink="true"&gt;I have recently been able to run AutoML successfully on a certain dataset.&amp;nbsp; But it has just failed on a second dataset of similar construction, before being able to produce any machine learning training runs or output.&amp;nbsp; The Experiments page says&lt;/P&gt;&lt;P data-unlink="true"&gt;&lt;SPAN class=""&gt;```Model training failed&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;For more information, visit the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;AutoML job run.&amp;nbsp;&lt;BR /&gt;An unknown error occurred```&lt;/SPAN&gt;&lt;/P&gt;&lt;P data-unlink="true"&gt;The phrase "AutoML job run" links to a Run of an auto-generated training notebook.&amp;nbsp; In that notebook, the failure occurs in a cell whose contents are :&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dkxxxrc_0-1740403690249.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15046iB14856C006BEC61A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="dkxxxrc_0-1740403690249.png" alt="dkxxxrc_0-1740403690249.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The error statement is `&lt;SPAN&gt;A column, variable, or function parameter with name `_automl_sample_weight_0000` cannot be resolved.`.&amp;nbsp; That name `_automl_sample_weight_0000` is, of course, not from my data - it's something that AutoML is creating, or failing to create.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I am not using Feature Store or anything super-clever in the ML pipeline.&amp;nbsp; My data simply comes from a Delta Table, albeit a bigger one than when the AutoML worked successfully for me.&amp;nbsp; Call this one 50000 rows by 6000 columns.&lt;/P&gt;&lt;P data-unlink="true"&gt;Any suggestions for repairing?&lt;/P&gt;</description>
    <pubDate>Mon, 24 Feb 2025 13:33:38 GMT</pubDate>
    <dc:creator>dkxxx-rc</dc:creator>
    <dc:date>2025-02-24T13:33:38Z</dc:date>
    <item>
      <title>AutoML master notebook failing</title>
      <link>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/111049#M3975</link>
      <description>&lt;P data-unlink="true"&gt;I have recently been able to run AutoML successfully on a certain dataset.&amp;nbsp; But it has just failed on a second dataset of similar construction, before being able to produce any machine learning training runs or output.&amp;nbsp; The Experiments page says&lt;/P&gt;&lt;P data-unlink="true"&gt;&lt;SPAN class=""&gt;```Model training failed&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;For more information, visit the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;AutoML job run.&amp;nbsp;&lt;BR /&gt;An unknown error occurred```&lt;/SPAN&gt;&lt;/P&gt;&lt;P data-unlink="true"&gt;The phrase "AutoML job run" links to a Run of an auto-generated training notebook.&amp;nbsp; In that notebook, the failure occurs in a cell whose contents are :&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dkxxxrc_0-1740403690249.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15046iB14856C006BEC61A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="dkxxxrc_0-1740403690249.png" alt="dkxxxrc_0-1740403690249.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The error statement is `&lt;SPAN&gt;A column, variable, or function parameter with name `_automl_sample_weight_0000` cannot be resolved.`.&amp;nbsp; That name `_automl_sample_weight_0000` is, of course, not from my data - it's something that AutoML is creating, or failing to create.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I am not using Feature Store or anything super-clever in the ML pipeline.&amp;nbsp; My data simply comes from a Delta Table, albeit a bigger one than when the AutoML worked successfully for me.&amp;nbsp; Call this one 50000 rows by 6000 columns.&lt;/P&gt;&lt;P data-unlink="true"&gt;Any suggestions for repairing?&lt;/P&gt;</description>
      <pubDate>Mon, 24 Feb 2025 13:33:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/111049#M3975</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-02-24T13:33:38Z</dc:date>
    </item>
    <item>
      <title>Re: AutoML master notebook failing</title>
      <link>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/137055#M4395</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/137009"&gt;@dkxxx-rc&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Thanks for the detailed context. This error is almost certainly coming from AutoML’s internal handling of imbalanced data and sampling, not your dataset itself.&lt;/P&gt;
&lt;P&gt;The internal column _automl_sample_weight_0000 is created by AutoML when it detects imbalance and applies class weighting/sampling; in some ML runtime versions, a bug can make AutoML reference that column before it’s properly materialized, causing “cannot be resolved.”&lt;/P&gt;
&lt;P&gt;This shows up more often when AutoML needs to sample due to memory constraints (wide/high‑dimensional tables or insufficient per‑core memory on the worker/driver). AutoML’s sampling behavior depends strongly on memory per core, and datasets are sampled when the estimated memory exceeds available resources.&lt;/P&gt;
&lt;P&gt;My main suggestion would be to try to reduce the total number of columns you pass to AutoML from 6000 to something significantly less. There are likely a few thousand columns that would be useless to the ML model, and preprocessing the dataset a little bit before giving it to AutoML will significantly improve the chances of AutoML being successful.&lt;/P&gt;
&lt;P&gt;Removing low variance features and highly correlated features would be a good start.&lt;/P&gt;
&lt;P&gt;Alternatively (and perhaps in addition to pruning the feature set), you can use clusters with significantly more memory per core - do you happen to know what your current configuration is?&lt;/P&gt;</description>
      <pubDate>Fri, 31 Oct 2025 15:49:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/137055#M4395</guid>
      <dc:creator>stbjelcevic</dc:creator>
      <dc:date>2025-10-31T15:49:16Z</dc:date>
    </item>
    <item>
      <title>Re: AutoML master notebook failing</title>
      <link>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/137394#M4401</link>
      <description>&lt;P&gt;I have been using all my own model construction lately rather than AutoML, so I won't have any new experiences or attempts to report in this thread.&amp;nbsp; However, your insight about what's happening under the hood is valuable and enlightening and will likely do me some good in the long run.&amp;nbsp; Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2025 12:03:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/automl-master-notebook-failing/m-p/137394#M4401</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-11-03T12:03:36Z</dc:date>
    </item>
  </channel>
</rss>

