โ03-21-2023 05:59 AM
I'm looking to migrate onto unity catalog but a number of my data ingestion notebooks throw a securityexception/whitelist errors for numerous spark. functions
Is there some configuration setting I need to enable to whitelist the spark.* methods/functions?
I know its because I'm using 'shared' access mode. I've always run 'no isolation shared' clusters before with external tables when using hive metastore
I use externally managed tables and use spark.catalog to check if a table exists before I create it. This is failing with the whitelist error. I can refactor that check to use the information_schema columns I guess?
But any tips on how to refactor this?
I have multiple tsvs which have free text comments at the top of the file. I need to skip n lines and process the rest
row_rdd = spark.sparkContext \
.textFile(sourceFilePath) \
.zipWithIndex() \
.filter(lambda row: row[1] >= n_skip_rows) \
.map(lambda row: row[0])
df = spark.read.csv(row_rdd,sep='\t',header="true",inferSchema="true")
I also need to process vcfs using the glow library - this doesn't work either
Are there any docs on what Single user access mode actually is? Is it like its running using someone's credentials as a service account? Can other users connect to it using odbc/jdbc and an access token? Or is it a personal compute which only allows one connection?
โ03-22-2023 09:27 AM
@Jakub Kโ there are few limitations to migrate external tables, like optimization is not supported in unity catalog and when you create cluster with single access mode, you should be able to handle that . please follow below steps , external location and credentials should be created and should have proper acess to perform upgradation
https://docs.databricks.com/data-governance/unity-catalog/migrate.html
โ03-27-2023 04:28 AM
I don't need help with migrating data from the hive metastore
I'm looking for some design patterns for ingesting new data into unity catalog
Do I really need a dedicated cluster per user to be able to use unity catalog & load data?? That can't be right
โ08-23-2023 03:11 PM
@jakubk wrote:
Do I really need a dedicated cluster per user to be able to use unity catalog & load data?? That can't be right
I certainly Hope that ain't the case. I can call spark.catalog.TableExists without issue from a Personal Compute cluster, but when I try to call it from a Shared Compute cluster with Access mode = Shared, I get this error:
"py4j.security.Py4JSecurityException: Method public boolean org.apache.spark.sql.internal.CatalogImpl.tableExists(java.lang.String) is not whitelisted ..."
How do I check if a table exists from a Shared Cluster if I'm not allowed to use spark.catalog.tableExists?
โ08-23-2023 03:50 PM - edited โ08-23-2023 04:01 PM
I found a workaround for my particular situation where I just needed to check if a table existed. It was based on these posts:
โ03-26-2023 11:41 PM
Hi @Jakub Kโ
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
โ03-30-2023 04:22 PM
No I haven't, I can't see any answers posted?
โ03-31-2023 06:55 PM
Hi @Jakub Kโ
I'm sorry you could not find a solution to your problem in the answers provided.
Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.
I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.
Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.
Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group