cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to circumvent Py4JSecurityException for spark-nlp : Constructor public com.johnsnowlabs.nlp.***(java.lang.String) is not whitelisted.

KenAN
New Contributor II

Running into the following error on our company's cluster.

py4j.security.Py4JSecurityException: Constructor public com.johnsnowlabs.nlp.DocumentAssembler(java.lang.String) is not whitelisted.

For the following code(which is just tutorial code from the spark-nlp page)

df = spark.createDataFrame([("Yeah, I get that. is the",)], ["comment"])
document_assembler = DocumentAssembler() \
    .setInputCol("comment") \
    .setOutputCol("document")
    
sentence_detector = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") \
    .setUseAbbreviations(True)
    
tokenizer = Tokenizer() \
  .setInputCols(["sentence"]) \
  .setOutputCol("token")
stemmer = Stemmer() \
    .setInputCols(["token"]) \
    .setOutputCol("stem")
    
normalizer = Normalizer() \
    .setInputCols(["stem"]) \
    .setOutputCol("normalized")
 
finisher = Finisher() \
    .setInputCols(["normalized"]) \
    .setOutputCols(["ntokens"]) \
    .setOutputAsArray(True) \
    .setCleanAnnotations(True)
 
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, stemmer, normalizer, finisher])
 
nlp_model = nlp_pipeline.fit(df)
processed = nlp_model.transform(df).persist()
 
processed.count()
processed.show()

When I tried adding this to the spark config

 spark.databricks.pyspark.enablePy4JSecurity false
 

It says

 spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access mode
 

I would appreciate any help. It seems others at my company have run into the same issue with other packages.

Thank you

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

That error is prevalent in high concurrency / shared clusters. Please test it on a single user / standard standalone cluster.

Anonymous
Not applicable

Hi @Kenan Spruillโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Apoorv
New Contributor II

Hi @Vidula Khannaโ€‹ ,

I would like to know more about the solution to the suggested solution to the above problem. I have upgraded my cluster to 11.3 LTS (unity catalog enabled ) and shared cluster mode. But one of the java functions I am using gives the whitelisting error. Could you please suggest a possible solution while still keeping the shared cluster access mode?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group