Azure Shared Clusters - P4J Security Exception on non-whitelisted classes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2023 06:58 PM
When I try to use a third party JAR on an Azure shared cluster - which is installed via Maven and I can successfully import - , I get the following message:
py4j.security.Py4JSecurityException: Method public static org.apache.spark.sql.Column com.databricks.spark.xx.yy.zz() is not whitelisted on class class com.databricks.spark.xx.yy
How do I whitelist third-party library code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2023 06:25 AM
Thanks Kaniz.
I must use a shared cluster because I'm reading from a DLT table stored in a Unity Catalog.
https://docs.databricks.com/en/data-governance/unity-catalog/compute.html
My understanding is that shared clusters are enforcing the Py4J policy I referenced. I am not sure if this is the same as what you refer to as "table access control", but also I am not trying to use readStream(). Rather I'm trying to use code from a third-party library that isn't included in the base cluster runtime. I've installed this library by supplying Maven coordinates in the compute configuration.
So I am wondering if it's possible to, as a customer that must use a shared cluster under the circumstances I described, allowlist third party code that I choose. Otherwise, how is one to use third-party code that hasn't yet been allowlisted while reading from DLT in Unity?