-werners-
Esteemed Contributor III

I still am not convinced you need loops.

Regex matches can be done in spark using regexp_replace, regexp_find etc.

You list of words to check against can also be put in a dataframe.

I agree it does not seem obvious, but the moment you start looping you say goodbye to parallel processing.