cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can you help with this error please? Issue when using a new high concurrency cluster

TJS
New Contributor II

Hello, I am trying to use MLFlow on a new high concurrency cluster but I get the error below. Does anyone have any suggestions? It was working before on a standard cluster. Thanks.

py4j.security.Py4JSecurityException: Method public int org.apache.spark.SparkContext.maxNumConcurrentTasks() is not whitelisted on class class org.apache.spark.SparkContext

--------------------------------------------------------------------------- Py4JError Traceback (most recent call last) <command-2769834740329298> in <module> 32 # Greater parallelism will lead to speedups, but a less optimal hyperparameter sweep. 33 # A reasonable value for parallelism is the square root of max_evals. ---> 34 spark_trials = SparkTrials(parallelism=10) 35 36 /databricks/.python_edge_libs/hyperopt/spark.py in __init__(self, parallelism, timeout, loss_threshold, spark_session) 101 ) 102 # maxNumConcurrentTasks() is a package private API --> 103 max_num_concurrent_tasks = self._spark_context._jsc.sc().maxNumConcurrentTasks() 104 spark_default_parallelism = self._spark_context.defaultParallelism 105 self.parallelism = self._decide_parallelism( /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1303 answer = self.gateway_client.send_command(command) 1304 return_value = get_return_value( -> 1305 answer, self.gateway_client, self.target_id, self.name) 1306

1 ACCEPTED SOLUTION

Accepted Solutions

User16753724828
New Contributor III

@Tom Soto​ We have a workaround for this. This cluster spark configuration setting will disable py4jSecurity while still enabling passthrough

 spark.databricks.pyspark.enablePy4JSecurity false

View solution in original post

6 REPLIES 6

Anonymous
Not applicable

Hello, @Tom Soto​! My name is Piper and I'm a moderator for Databricks. It's great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will follow up shortly with a response.

jose_gonzalez
Moderator
Moderator

hi @Tom Soto​ ,

The error message is coming from your high concurrency cluster's security model. This is build-in security model to restrict access to your data. Your code might work on standard cluster but not on high concurrency clusters.

TJS
New Contributor II

Thank you for your response. I appreciate this but are you aware of any work around to use a high concurrency cluster as it is a special databricks function that is the issue

User16753724828
New Contributor III

@Tom Soto​ We have a workaround for this. This cluster spark configuration setting will disable py4jSecurity while still enabling passthrough

 spark.databricks.pyspark.enablePy4JSecurity false

TJS
New Contributor II

Thank you very much. This workaround worked for me.

Piper_Wilson
New Contributor III

@Tom Soto​ - If Pradpalnis fully answered your question, would you be happy to mark their answer as best so that others can quickly find the solution?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!