Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2022 07:16 AM
My earlier search for empty strings in the original table failed. So, I guess, what's going on is that despite running the encoder on indexed columns, the encoder validates against the original columns and ignores the 'handleInvalid' option, leading to the error. It's incredibly confusing. Here is a work around:
transform_empty = udf(lambda s: "NA" if s == "" else s, StringType())
for col in indexed_in_categorical_columns:
train = train.withColumn(col, transform_empty(col))