AutoML split with dt column not working properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2024 05:17 AM
I am using AutoML and want to split my data to train/validation and test using a dt column (one date for train one different date for validation and a third date for test. The problem that the autoML fails, there are only training metrics (no valiation nor test ones) and when I check the data exploratory notebook it seems that all samples are considered as training eventhough the corresponding dt are different. When I look to model artifacts, I see that the column dt were taken into consideration as feature by the model
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2024 05:25 AM
this is what I see in my data exploration notebook. All dates are considered part of the training split
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2024 04:50 PM
Hello! Did you try specify a column name as manual split column?
Then you can fully control which rows are in train / validate / test splits: https://docs.databricks.com/en/machine-learning/automl/automl-data-preparation.html#split-data-for-r...

