Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2021 12:59 PM
By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of functionality.
You can also use spacy, nltk, and other non-Spark NLP libraries with Spark, by writing pandas UDFs that leverage these libraries, then applying them to data with Spark.