OneHotEncoder fails with 'Cannot have an empty string for name'

Mr__E
Contributor II

I have followed the basic guide on using OneHotEncoder, matching the syntax exactly with my own data tables. The tables have enumerated string values. I first run a StringIndexer (both with and without handleInvalid set):

indexer = StringIndexer(
    inputCols=indexed_in_categorical_columns,
    outputCols=indexed_out_categorical_columns,
    handleInvalid='keep',
)
 
train_magic = train.select(indexed_in_categorical_columns).dropna()
indexed_stuff = indexer.fit(train_magic)
indexed_stuff_df = indexed_stuff.transform(train_magic)

Then I use the columns (I've tried individual columns -- some work and some don't -- as well as combined columns) by encoding them, with and without the handleInvalid / dropLast set:

dumb_encoder = OneHotEncoder(
    handleInvalid='keep',
    dropLast=True,
    inputCols=indexer.getOutputCols(),
    outputCols=encoded_out_categorical_columns,
)

Then I run the encoder:

encoded_stuff_df = dumb_encoder.fit(indexed_stuff_df.select(indexed_out_categorical_columns))

The error from this step is:

IllegalArgumentException: requirement failed: Cannot have an empty string for name.

The output is useless, since it drops any information about the offending values. I've verified that the indexed columns have _no_ null values and I tried (as above) running dropna(), so it doesn't make sense. I checked the param maps on the indexer and encoder and all of them have name, so that's not the issue.

Any thoughts on how I can figure this out?