How to circumvent Py4JSecurityException for spark-...

KenAN · ‎10-12-2022

Running into the following error on our company's cluster.

py4j.security.Py4JSecurityException: Constructor public com.johnsnowlabs.nlp.DocumentAssembler(java.lang.String) is not whitelisted.

For the following code(which is just tutorial code from the spark-nlp page)

df = spark.createDataFrame([("Yeah, I get that. is the",)], ["comment"])
document_assembler = DocumentAssembler() \
    .setInputCol("comment") \
    .setOutputCol("document")
    
sentence_detector = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") \
    .setUseAbbreviations(True)
    
tokenizer = Tokenizer() \
  .setInputCols(["sentence"]) \
  .setOutputCol("token")
stemmer = Stemmer() \
    .setInputCols(["token"]) \
    .setOutputCol("stem")
    
normalizer = Normalizer() \
    .setInputCols(["stem"]) \
    .setOutputCol("normalized")
 
finisher = Finisher() \
    .setInputCols(["normalized"]) \
    .setOutputCols(["ntokens"]) \
    .setOutputAsArray(True) \
    .setCleanAnnotations(True)
 
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, stemmer, normalizer, finisher])
 
nlp_model = nlp_pipeline.fit(df)
processed = nlp_model.transform(df).persist()
 
processed.count()
processed.show()

When I tried adding this to the spark config

 spark.databricks.pyspark.enablePy4JSecurity false

It says

 spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access mode

I would appreciate any help. It seems others at my company have run into the same issue with other packages.

Thank you

How to circumvent Py4JSecurityException for spark-nlp : Constructor public com.johnsnowlabs.nlp.***(java.lang.String) is not whitelisted.