cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

abhaigh
New Contributor III

Hi all

Having some fun trying to run a notebook on a shared UC-aware, shared cluster - I keep on running into this error:

py4j.security.Py4JSecurityException: Method public static org.apache.spark.sql.SparkSession org.apache.sedona.spark.SedonaContext.create(org.apache.spark.sql.SparkSession) is not whitelisted on class class org.apache.sedona.spark.SedonaContext

Now - the notebook that is having the problem runs perfectly on a non-shared cluster - but I need to have it running on the shared cluster

I can't ANY information on how to get this public static method whitelisted on the class so it can run w/o error on the shared cluster

So...   My question is - "how do I whitelist this public static method so that my cluster doesn't barf when it try's to run the "
SedonaContext.create(spark)"  command that is generating the error?

...and before you suggest it - "Table Access Control" is NOT enabled - so it's not that

Cheers

-=A=-

.

1 ACCEPTED SOLUTION

Accepted Solutions

abhaigh
New Contributor III

Thanks for the reply Kaniz - Not that any of your possible solutions work, but thanks anyway

I don't have the option to set "Notebook Settings" in my databricks notebook in Azure

The system refuses to permit me to set "spark.databricks.pyspark.enablePy4JSecurity false" on the SHARED cluster because of the security risks

You would have been better to have done the same research I did and refer me to THIS page

"spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing an access mode" [https://community.databricks.com/t5/data-governance/quot-spark-databricks-pyspark-enablepy4jsecurity...]

And this comment on it 

"Hi @Christine Pedersen​ , It is not possible to use it in a shared cluster as of now. Thank you!"[https://community.databricks.com/t5/data-governance/quot-spark-databricks-pyspark-enablepy4jsecurit...

And no - I'm not opening up a support call with Databricks, not when I know what they will tell me

Again, thanks anyway - but what I wanted to do is currently impossible

-=A=-

.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @abhaigh , Certainly! It seems you’re encountering a security issue related to the Py4J framework when running your notebook on a shared cluster. 

 

Let’s address this and explore potential solutions:

 

Py4J Security Exception:

  • The error message you’re seeing indicates that the method org.apache.sedona.spark.SedonaContext.create is not whitelisted for execution in the shared cluster.
  • Py4J is a bridge between Python and Java, allowing Python code to interact with Java objects (such as Spark).
  • By default, Databricks clusters have security features enabled to prevent unsafe operations.

Whitelisting the Method:

  • To resolve this issue, you can whitelist the specific classes and methods that you need to use in your PySpark code.
  • One way to achieve this is by setting the spark.jvm.class.allowlist configuration property in your S....
  • Here’s how you can do it:
    • In your Databricks notebook, click on “File” > “Notebook Settings.”
    • Under “Advanced Options,” add the following configuration:spark.jvm.class.allowlist org.apache.sedona.spark.SedonaContext
    • Save the settings and restart your cluster.
  • This approach allows the security feature to remain turned on while explicitly allowing the specified class (SedonaContext) to execute.

Alternative Approach:

  • If whitelisting doesn’t work or if you encounter any limitations, consider an alternative approach:
    • Disable Py4J Security (Not Recommended):
      • You can disable the security feature altogether by setting spark.databricks.pyspark.enablePy4JSecurity to false.
      • However, this option is not recommended due to security risks.
    • Review Cluster Configuration:
      • Compare the cluster configuration between the working non-shared cluster and the shared cluster.
      • Look for any additional settings related to catalog whitelisting or security.
      • Ensure that the shared cluster has the necessary permissions and configurations.

Remember to balance security and functionality when making these adjustments.

 

 Whitelisting specific classes is a safer approach than disabling security entirely.

 

If you encounter any further issues, consider reaching out to Databricks support by filing support ticket for more specific guidance. 🚀🔒📝

abhaigh
New Contributor III

Thanks for the reply Kaniz - Not that any of your possible solutions work, but thanks anyway

I don't have the option to set "Notebook Settings" in my databricks notebook in Azure

The system refuses to permit me to set "spark.databricks.pyspark.enablePy4JSecurity false" on the SHARED cluster because of the security risks

You would have been better to have done the same research I did and refer me to THIS page

"spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing an access mode" [https://community.databricks.com/t5/data-governance/quot-spark-databricks-pyspark-enablepy4jsecurity...]

And this comment on it 

"Hi @Christine Pedersen​ , It is not possible to use it in a shared cluster as of now. Thank you!"[https://community.databricks.com/t5/data-governance/quot-spark-databricks-pyspark-enablepy4jsecurit...

And no - I'm not opening up a support call with Databricks, not when I know what they will tell me

Again, thanks anyway - but what I wanted to do is currently impossible

-=A=-

.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.