cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
cancel
Showing results for 
Search instead for 
Did you mean: 

How to resolve problem with upgrading external table to Unity Catalog?

vexor
Contributor

I have a problem with upgrading table to Unity Catalog, I got the error:

summary:

Error in SQL statement:

AnalysisException:

org.apache.hadoop.hive.ql.metadata.HiveException:

Unable to fetch table calexception. Exception thrown when executing query :

SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.OWNER,A0.RETENTION,A0.IS_REWRITE_ENABLED,A0.TBL_NAME,A0.TBL_TYPE,A0.TBL_ID FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID WHERE A0.TBL_NAME = ? AND B0.`NAME` = ?; UpgradeTableCommand `ps_dev`.`dd_omp`.`calexception`, `hive_metastore`.`dd_omp`.`calexception`, false, true, false, false, false , data: com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException:

Did anybody have ever similar problem with upgrading external tables to Unity Catalog? I couldn't find a single clue about this issue. Full error message is attached.

This is my spark config in Cluster:

spark.sql.hive.metastore.version 1.2.1

hive.metastore.schema.verification.record.version false

spark.databricks.service.port 8787

spark.databricks.driver.enableUserContextForPythonAndRCommands true

spark.sql.hive.metastore.jars /dbfs/databricks/hive_metastore_jars/*

hive.metastore.schema.verification false

spark.databricks.delta.preview.enabled true

spark.databricks.service.server.enabled true

15 REPLIES 15

Aviral-Bhardwaj
Esteemed Contributor III

@Michal Milosz​  Please set up meeting on Sunday IST time Zone @ ardb40@gmail.com we will connect and try to resolve it

Could you please take a look at my issue one more time. I could not make a meeting on Sunday.

etsyal1e2r3
Honored Contributor

Well your error is with the hive metastore which isnt an external table. Can you post the query youre using? You probably bave to set the external location to your desired cloud path (and configure the external location in the data tab). Basically run your command but add MANAGED LOCATION before the path like this...

# Set Variables
catalog_name = test_catalog
schema_name = test_schema
catalog_container = <catalog container string>
storage_account = <storage account string>
blob_path = f"abfss://{catalog_container}@{storage_account}.dfs.core.windows.net/<folder path>"
 
# Create Catalog if it doesn't exist
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog_name} MANAGED LOCATION 'abfss://{catalog_container}@{storage_account}.dfs.core.windows.net/'")
 
# Create Schema if it doesn't exist
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog_name }.{schema_name }")
 
# Read Files into Dataframe
df = spark.read.format('json').load(blob_path)
df.display()

My table in hive metastore (hive_metastore.dd_omp.versioncompatibility) is an EXTERNAL TABLE. I don't understand why I can select and create table in Unity Catolog without setting hive version to 1.2.1.

I have two issues:

1) When I don't set hive version in advanced option of cluster then I got the ERROR:

"Parquet does not support timestamp. See HIVE-6384"

2) I found the solution of above issue - I need to set hive version to at least 1.2.1 in advanced option of cluster. When I did it, I got the error, which I describe in my first post.

etsyal1e2r3
Honored Contributor

The hive metastore is the default metastore within the databricks workspace's resource group. Unless you named an external location "hivemetastore" I dont see how this is on an external location. Did you run the code I sent you? You have to make a new catalog and schema within your blob storage to use the table as a MANAGED TABLE. This is the preferred way. Databricks doesnt recommend using dbfs or hivemetastore for data.

I completely don't understand your point. I have created EXTERNAL TABLES in hive_metastore catalog as :

CREATE TABLE hive_metastore.dd_omp.versioncompatibility LOCATION 'dbfs:/mnt/{storage_account}/{blob_name}/{path_to_data}';

(I am using mounted path here)

I don't have a problem to select the data until I set hive version in advanced option of cluster.

I don't want to make it MANAGED TABLE, I need to still have EXTERNAL TABLE, but in Unity Catalog.

etsyal1e2r3
Honored Contributor

But that's my point, why are you mounting to dbfs instead of using external blob location? You should clone your table to a managed table in blob with your workspace enabled for unity catalog in the metastore.

Okay, I get it, but right now I can't clone my table, because I've got an error, which I described earlier. I'm looking for solution to resolve this issues:

1) When I don't set hive version in advanced option of cluster then I got the ERROR:

"Parquet does not support timestamp. See HIVE-6384"

2) I found the solution of above issue - I need to set hive version to at least 1.2.1 in advanced option of cluster. When I did it, I got the error, which I describe in my first post.

etsyal1e2r3
Honored Contributor

How did you make that table? Can't you just make it in the blob location instead?

I didn't create this table by myself, but there was used query like:

"CREATE TABLE hive_metastore.dd_omp.versioncompatibility LOCATION 'dbfs:/mnt/{storage_account}/{blob_name}/{path_to_data}';'"

I don't want to create it again in hive_metastore. I need to create the same table, based on the same path but in Unity Catalog.

etsyal1e2r3
Honored Contributor

So can you get the data in a dataframe in databricks so you can create the table? Are you able to access the data in that table with a select statement? Have you made your own metastore for your external location and tied the workspace to it to enable unity catalog for the workspace? If you can do all those things then you can do

df = spark.sql("select * from <catalog>.<schema>.<table>")
df = df.distinct()
df.write.saveAsTable("<catalog>.<schema>.<table>")

I also noticed this was the table you were trying to select from?

'org.apache.hadoop.hive.metastore.model.MTable'

Where is the catalog and schema and table in that string? If any of those have a dot in it you need to surround each with backticks (``).

#Ex.
table = "`catalog`.`schema`.`table`"
df = spark.sql(f" select * from {table}")

etsyal1e2r3
Honored Contributor

No luck trying this? What about the questions I asked?

vexor
Contributor

I used SYNC TABLE query to create new external table based on table created in hive metastore. This is my query:

SYNC TABLE ps_dev.dd_omp.versioncompatibility

FROM hive_metastore.dd_omp.versioncompatibility

DRY RUN 

Everything started when I tried sync all schema to new catalog. Then I got the error for several tables: "Parquet does not support timestamp. See HIVE-6384". Then I changed the hive version and then problems started. After setting up hive version to later than 1.2.1 (cause that resolved issue with parquet) I got the error that I attached before.

With this setting I can not even select the data from table:

SELECT * FROM hive_metastore.dd_omp.versioncompatibility

ERROR MESSAGE:

Py4JError: An error occurred while calling o405.getCause. Trace:

py4j.security.Py4JSecurityException: Method public synchronized java.lang.Throwable javax.jdo.JDOException.getCause() is not whitelisted on class class javax.jdo.JDOException

at py4j.security.WhitelistingPy4JSecurityManager.checkCall(WhitelistingPy4JSecurityManager.java:473)

at py4j.Gateway.invoke(Gateway.java:305)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)

at py4j.ClientServerConnection.run(ClientServerConnection.java:115)

at java.lang.Thread.run(Thread.java:750)

But when I changed the settings and removed hive version then I can't use SYNC TABLE query. Do you have any ideas what is wrong? Maybe I should add something to advanced options in my cluster. It crucial to have Unity Catalog feature in Cluster.

Anonymous
Not applicable

Hi @Michal Milosz​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.