cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to I select an 80/10/10 split when doing AutoML

bothma2
New Contributor II

Headline says it all. I am doing a regression and want to select a testvaltrain split that is not 60/20/20. Anyone know how to do this?

3 REPLIES 3

mhiltner
Databricks Employee
Databricks Employee

You can't do it but there is a workaround:

You could use a fake time column and force it to be 80/10/10. 


If you specify this column (in the advanced section - "Time column for training/validation/testing split"), the dataset is split into training, validation, and test sets by time. The earliest points are used for training, the next earliest for validation, and the latest points are used as a test set. Try using all the earliest points as the same timestamp, then another for validation and another one for testing. 

 

 

bothma2
New Contributor II

Wouldn;t that just do a 60/20/20 split?

mhiltner
Databricks Employee
Databricks Employee

You'd need to put 80% of your data with the earliest timestamp, then 10% with another one and 10% with another. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group