Databricks Community

Noura_azza · ‎01-24-2024

I am using AutoML and want to split my data to train/validation and test using a dt column (one date for train one different date for validation and a third date for test. The problem that the autoML fails, there are only training metrics (no valiation nor test ones) and when I check the data exploratory notebook it seems that all samples are considered as training eventhough the corresponding dt are different. When I look to model artifacts, I see that the column dt were taken into consideration as feature by the model

Noura_azza · ‎01-24-2024

this is what I see in my data exploration notebook. All dates are considered part of the training split

maggiewang · ‎08-28-2024

Hello! Did you try specify a column name as manual split column?

Then you can fully control which rows are in train / validate / test splits: https://docs.databricks.com/en/machine-learning/automl/automl-data-preparation.html#split-data-for-r...

Databricks Community

AutoML split with dt column not working properly

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon