cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NhatHoang
by Valued Contributor II
  • 5937 Views
  • 3 replies
  • 15 kudos

Resolved! Do One-Hot-Encoding (OHE) before or after split data to train and test dataframe

Hi,I wonder that I should do OHE before or after I split data to build up a ML model.Please give some advise.

  • 5937 Views
  • 3 replies
  • 15 kudos
Latest Reply
LandanG
Databricks Employee
  • 15 kudos

Hi @Nhat Hoang​ ,While not Databricks-specific, here's a good answer:"If you perform the encoding before the split, it will lead to data leakage (train-test contamination). In this sense, you will introduce new data (integers of Label Encoders) and u...

  • 15 kudos
2 More Replies
Labels