Resolved! Do One-Hot-Encoding (OHE) before or after split data to train and test dataframe
Hi,I wonder that I should do OHE before or after I split data to build up a ML model.Please give some advise.
- 5937 Views
- 3 replies
- 15 kudos
Latest Reply
Hi @Nhat Hoang​ ,While not Databricks-specific, here's a good answer:"If you perform the encoding before the split, it will lead to data leakage (train-test contamination). In this sense, you will introduce new data (integers of Label Encoders) and u...
- 15 kudos