cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NhatHoang
by Valued Contributor II
  • 4853 Views
  • 3 replies
  • 15 kudos

Resolved! Do One-Hot-Encoding (OHE) before or after split data to train and test dataframe

Hi,I wonder that I should do OHE before or after I split data to build up a ML model.Please give some advise.

  • 4853 Views
  • 3 replies
  • 15 kudos
Latest Reply
LandanG
Honored Contributor
  • 15 kudos

Hi @Nhat Hoang​ ,While not Databricks-specific, here's a good answer:"If you perform the encoding before the split, it will lead to data leakage (train-test contamination). In this sense, you will introduce new data (integers of Label Encoders) and u...

  • 15 kudos
2 More Replies
Labels