- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2025 08:49 AM
Hi @dkxxx-rc ,
Thanks for the detailed context. This error is almost certainly coming from AutoML’s internal handling of imbalanced data and sampling, not your dataset itself.
The internal column _automl_sample_weight_0000 is created by AutoML when it detects imbalance and applies class weighting/sampling; in some ML runtime versions, a bug can make AutoML reference that column before it’s properly materialized, causing “cannot be resolved.”
This shows up more often when AutoML needs to sample due to memory constraints (wide/high‑dimensional tables or insufficient per‑core memory on the worker/driver). AutoML’s sampling behavior depends strongly on memory per core, and datasets are sampled when the estimated memory exceeds available resources.
My main suggestion would be to try to reduce the total number of columns you pass to AutoML from 6000 to something significantly less. There are likely a few thousand columns that would be useless to the ML model, and preprocessing the dataset a little bit before giving it to AutoML will significantly improve the chances of AutoML being successful.
Removing low variance features and highly correlated features would be a good start.
Alternatively (and perhaps in addition to pruning the feature set), you can use clusters with significantly more memory per core - do you happen to know what your current configuration is?