cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

How to circumvent Py4JSecurityException for spark-nlp : Constructor public com.johnsnowlabs.nlp.***(java.lang.String) is not whitelisted.

KenAN
New Contributor II

Running into the following error on our company's cluster.

py4j.security.Py4JSecurityException: Constructor public com.johnsnowlabs.nlp.DocumentAssembler(java.lang.String) is not whitelisted.

For the following code(which is just tutorial code from the spark-nlp page)

df = spark.createDataFrame([("Yeah, I get that. is the",)], ["comment"])
document_assembler = DocumentAssembler() \
    .setInputCol("comment") \
    .setOutputCol("document")
    
sentence_detector = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") \
    .setUseAbbreviations(True)
    
tokenizer = Tokenizer() \
  .setInputCols(["sentence"]) \
  .setOutputCol("token")
stemmer = Stemmer() \
    .setInputCols(["token"]) \
    .setOutputCol("stem")
    
normalizer = Normalizer() \
    .setInputCols(["stem"]) \
    .setOutputCol("normalized")
 
finisher = Finisher() \
    .setInputCols(["normalized"]) \
    .setOutputCols(["ntokens"]) \
    .setOutputAsArray(True) \
    .setCleanAnnotations(True)
 
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, stemmer, normalizer, finisher])
 
nlp_model = nlp_pipeline.fit(df)
processed = nlp_model.transform(df).persist()
 
processed.count()
processed.show()

When I tried adding this to the spark config

 spark.databricks.pyspark.enablePy4JSecurity false
 

It says

 spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access mode
 

I would appreciate any help. It seems others at my company have run into the same issue with other packages.

Thank you

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

That error is prevalent in high concurrency / shared clusters. Please test it on a single user / standard standalone cluster.

Anonymous
Not applicable

Hi @Kenan Spruill​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Apoorv
New Contributor II

Hi @Vidula Khanna​ ,

I would like to know more about the solution to the suggested solution to the above problem. I have upgraded my cluster to 11.3 LTS (unity catalog enabled ) and shared cluster mode. But one of the java functions I am using gives the whitelisting error. Could you please suggest a possible solution while still keeping the shared cluster access mode?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.