cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot use RDD and cannot set "spark.databricks.pyspark.enablePy4JSecurity false" for cluster

Christine
Contributor II

I have been using "rdd.flatMap(lambda x:x)" for a while to create lists from columns however after I have changed the cluster to a Shared acess mode (to use unity catalog) I get the following error:

py4j.security.Py4JSecurityException: Method public org.apache.spark.rdd.RDD org.apache.spark.api.java.JavaRDD.rdd() is not whitelisted on class class org.apache.spark.api.java.JavaRDD

I have tried to solve the error by adding:

"spark.databricks.pyspark.enablePy4JSecurity false"

however I then get the following error:

"spark.databricks.pyspark.enablePy4JSecurity is not allowed when chossing an access mode"

Does anybody know how to use RDD when using a cluster for unity catalouge?

Thank you!

19 REPLIES 19

rahuja
New Contributor III

was this resolved?

him_agg
New Contributor II

I was having a similar issue in using .rdd.map()
Solved it by adding two key value pairs in the spark config for the cluster

spark.databricks.pyspark.enablePy4JSecurity false

spark.databricks.pyspark.trustedFilesystems org.apache.spark.api.java.JavaRDD

 

After this I was able to read the schema of the json from the column that was read as string 

    json_schema = spark.read.json(df.rdd.map(lambda row: row.preferences)).schema
    print(json_schema)

Did you tried this in a UC enabled cluster?

rahuja
New Contributor III

In my case the problem was that we were trying to use SparkXGBoostRegressor and in the docs it says that it does not work on clusters with autoscaling enabled. So we just disabled autoscaling for the interactive cluster where we were testing the model and it worked like a charm 🙂

 

Hope it helps

de-qrosh
New Contributor II

Hello,
In the past I used 

rdd.mapPartitions(lambda ...)

 to call functions that access third party APIs like azure ai translate text to batch call the API and return the batched data.

How would one do this now? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group