03-03-2023 05:32 AM
I have been using "rdd.flatMap(lambda x:x)" for a while to create lists from columns however after I have changed the cluster to a Shared acess mode (to use unity catalog) I get the following error:
py4j.security.Py4JSecurityException: Method public org.apache.spark.rdd.RDD org.apache.spark.api.java.JavaRDD.rdd() is not whitelisted on class class org.apache.spark.api.java.JavaRDD
I have tried to solve the error by adding:
"spark.databricks.pyspark.enablePy4JSecurity false"
however I then get the following error:
"spark.databricks.pyspark.enablePy4JSecurity is not allowed when chossing an access mode"
Does anybody know how to use RDD when using a cluster for unity catalouge?
Thank you!
05-06-2024 02:21 AM
was this resolved?
05-29-2024 10:44 AM
I was having a similar issue in using .rdd.map()
Solved it by adding two key value pairs in the spark config for the cluster
spark.databricks.pyspark.enablePy4JSecurity false
spark.databricks.pyspark.trustedFilesystems org.apache.spark.api.java.JavaRDD
After this I was able to read the schema of the json from the column that was read as string
06-12-2024 08:20 AM
Did you tried this in a UC enabled cluster?
06-13-2024 04:44 AM
In my case the problem was that we were trying to use SparkXGBoostRegressor and in the docs it says that it does not work on clusters with autoscaling enabled. So we just disabled autoscaling for the interactive cluster where we were testing the model and it worked like a charm 🙂
Hope it helps
Sunday
Hello,
In the past I used
rdd.mapPartitions(lambda ...)
to call functions that access third party APIs like azure ai translate text to batch call the API and return the batched data.
How would one do this now?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group