cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unity Catalog - spark.* functions throwing Py4JSecurityException - org.apache.spark.sql.internal.CatalogImpl.currentCatalog() is not whitelisted on class class org.apache.spark.sql.internal.CatalogImpl

jakubk
Contributor

I'm looking to migrate onto unity catalog but a number of my data ingestion notebooks throw a securityexception/whitelist errors for numerous spark. functions

Is there some configuration setting I need to enable to whitelist the spark.* methods/functions?

I know its because I'm using 'shared' access mode. I've always run 'no isolation shared' clusters before with external tables when using hive metastore

I use externally managed tables and use spark.catalog to check if a table exists before I create it. This is failing with the whitelist error. I can refactor that check to use the information_schema columns I guess?

But any tips on how to refactor this?

I have multiple tsvs which have free text comments at the top of the file. I need to skip n lines and process the rest

    row_rdd = spark.sparkContext \
        .textFile(sourceFilePath) \
        .zipWithIndex() \
        .filter(lambda row: row[1] >= n_skip_rows) \
        .map(lambda row: row[0])
    df = spark.read.csv(row_rdd,sep='\t',header="true",inferSchema="true")

I also need to process vcfs using the glow library - this doesn't work either

Are there any docs on what Single user access mode actually is? Is it like its running using someone's credentials as a service account? Can other users connect to it using odbc/jdbc and an access token? Or is it a personal compute which only allows one connection?

7 REPLIES 7

karthik_p
Esteemed Contributor

@Jakub Kโ€‹ there are few limitations to migrate external tables, like optimization is not supported in unity catalog and when you create cluster with single access mode, you should be able to handle that . please follow below steps , external location and credentials should be created and should have proper acess to perform upgradation

https://docs.databricks.com/data-governance/unity-catalog/migrate.html

I don't need help with migrating data from the hive metastore

I'm looking for some design patterns for ingesting new data into unity catalog

Do I really need a dedicated cluster per user to be able to use unity catalog & load data?? That can't be right


@jakubk wrote:
Do I really need a dedicated cluster per user to be able to use unity catalog & load data?? That can't be right

I certainly Hope that ain't the case. I can call spark.catalog.TableExists without issue from a Personal Compute cluster, but when I try to call it from a Shared Compute cluster with Access mode = Shared, I get this error:
"py4j.security.Py4JSecurityException: Method public boolean org.apache.spark.sql.internal.CatalogImpl.tableExists(java.lang.String) is not whitelisted ..."

How do I check if a table exists from a Shared Cluster if I'm not allowed to use spark.catalog.tableExists?

I found a workaround for my particular situation where I just needed to check if a table existed. It was based on these posts:

Anonymous
Not applicable

Hi @Jakub Kโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

No I haven't, I can't see any answers posted?

Anonymous
Not applicable

Hi @Jakub Kโ€‹ 

I'm sorry you could not find a solution to your problem in the answers provided.

Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.

I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.

Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.

Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group