tl; dr:
When the AutoML run realizes it needs to do sampling because the driver / worker node memory is not enough to load / process the entire dataset, it fails. A sample weight column is NOT provided by me, but I believe somewhere in the process the automl system believes it was supplied, tries to find it and encounters an error:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `_automl_sample_weight_0000` cannot be resolved. Did you mean one of the following? [`_automl_split_col_0000`, `t*****d`, `o******l`, `s*******o`, `f*******o`]. SQLSTATE: 42703
More details:
I am running an experiment with AutoML on 15.4LTS ML runtime cluster. I set up the experiment with a driver node i3.2xlarge and worker node r5d.2xlarge. I see a log in the AutoML run:
2024/11/22 17:41:53 INFO databricks.automl.internal.size_estimator:
mem_req_data_load_mb = 52148.01129505962
mem_req_training_mb_dense = 30748.212867736816
mem_req_training_mb_sparse = 28898.903835296627
mem_req_training_mb = 30748.212867736816
2024/11/22 17:41:53 INFO databricks.automl.internal.size_estimator: fraction (0.34224123905739046) = min of
(available_memory_mb_per_trial (41139.0) / worker_max_memory_req_mb (52148.01129505962)),
(self._memory_mb_on_driver (17847.199999999997) / mem_req_data_load_mb (52148.01129505962))
At this point, it goes into the code where it seems it is trying to sample and then fails. I am attaching the stacktrace as a pdf.
See below snippet from the log showing that the sample_weight_col is NOT provided:
2024/11/22 17:38:56 INFO databricks.automl.internal.supervised_learner: AutoML called with params: target_col=w***s, data_dir=None exclude_cols=None exclude_columns=['p***d', 'E***d', 'T***D', 'M***e', 'T***y'] exclude_frameworks=['lightgbm', 'sklearn'] imputers=None metric=roc_auc max_trials=10000000000.0 timeout_minutes=120 experiment_id=4***6 time_col=None experiment_dir=/Users/s***m/databricks_automl pos_label=1 split_col=None sample_weight_col=None
The 2nd image above (Classifier._sample) clearly shows at that point the `sample_weight_col` is not None (goes inside the if). The 1st image is what seems suspicious to me, hence adding that.
Any help is most appreciated! I believe this piece of the source code is "internal" and closed hence posting here is the best I could do.