Running into the following error on our company's cluster.
py4j.security.Py4JSecurityException: Constructor public com.johnsnowlabs.nlp.DocumentAssembler(java.lang.String) is not whitelisted.
For the following code(which is just tutorial code from the spark-nlp page)
df = spark.createDataFrame([("Yeah, I get that. is the",)], ["comment"])
document_assembler = DocumentAssembler() \
.setInputCol("comment") \
.setOutputCol("document")
sentence_detector = SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence") \
.setUseAbbreviations(True)
tokenizer = Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")
stemmer = Stemmer() \
.setInputCols(["token"]) \
.setOutputCol("stem")
normalizer = Normalizer() \
.setInputCols(["stem"]) \
.setOutputCol("normalized")
finisher = Finisher() \
.setInputCols(["normalized"]) \
.setOutputCols(["ntokens"]) \
.setOutputAsArray(True) \
.setCleanAnnotations(True)
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, stemmer, normalizer, finisher])
nlp_model = nlp_pipeline.fit(df)
processed = nlp_model.transform(df).persist()
processed.count()
processed.show()
When I tried adding this to the spark config
spark.databricks.pyspark.enablePy4JSecurity false
It says
spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access mode
I would appreciate any help. It seems others at my company have run into the same issue with other packages.
Thank you