cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Endpoint with External Hive Metastore

Seth_J
New Contributor III

I am trying to setup a Databricks SQL endpoint to connect to an external hive metastore. I have replicated my metastore spark config in the SQL endpoint, with the addition of adding the below to the configuration:

spark.sql.hive.metastore.jars maven

 I keep getting the error

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

We are currently using Azure Databricks, Hive Metastore version 0.13, and an Azure MySQL database.

Here is the full SQL endpoint configuration:

spark.hadoop.javax.jdo.option.ConnectionPassword {{secrets/keyvault/db-secret-name}}
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://my-dbserver-name.mysql.database.azure.com:3306/my-db-name?sslMode=PREFERRED&verifyServerCertificate=true
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver
spark.sql.hive.metastore.version 0.13
spark.hadoop.javax.jdo.option.ConnectionUserName adminuser@my-db-name
spark.sql.hive.metastore.jars maven
spark.hadoop.hive.metastore.schema.verification false
spark.hadoop.hive.metastore.schema.verification.record.version false
spark.sql.hive.metastore.schema.verification.record.version true
spark.hadoop.datanucleus.autoCreateSchema false
spark.sql.hive.metastore.schema.verification true
spark.hadoop.datanucleus.schema.autoCreateTables false
spark.hadoop.datanucleus.fixedDatastore true

Do SQL endpoints support Hive Metastore version 0.13? Or any other guidance here would be appreciated.

9 REPLIES 9

User16741082858
Contributor III

https://docs.databricks.com/data/metastores/external-hive-metastore.html

In particular:

# Spark specific configuration options

spark.sql.hive.metastore.version <hive-version>

# Skip this one if <hive-version> is 0.13.x.

spark.sql.hive.metastore.jars <hive-jar-source>

Thanks for your reply. I tried without spark.sql.hive.metastore.jars first, but I saw in another thread here to include spark.sql.hive.metastore.jars albeit for another version of the metastore. Without spark.sql.hive.metastore.jars being specified I get the same error.

Kaniz
Community Manager
Community Manager

Hi @SETH JENSENโ€‹ , Thank you for the update.

Just a close follow-up. Do you still need help? Please let us know.

Seth_J
New Contributor III

Yes, unfortunately I haven't been able to figure it out yet. I have also reached out to some Azure reps to see if anyone there might be able to help.

Vivian_Wilfred
Honored Contributor
Honored Contributor

Hi @SETH JENSENโ€‹ , Just checking if you were able to solve this issue? If no, Can you try to modify the connection string and make sslMode=Require and see if it works?

Thanks for the suggestion. I tried sslMode=Require, sslMode=REQUIRED, and sslMode=VERIFY_CA from the MySQL driver docs, but I am still receiving the same error.

Seth_J
New Contributor III

Thanks, this was a great suggestion to try to figure out what's happening behind the scenes. Unfortunately, I am at a bit of a loss on what to do next.

I am getting the error:

Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

I also tried using "com.mysql.cj.jdbc.Driver" as the driver name, since that is what is in the MySQL connector documentation.

I have tried downloading the correct driver from the documentation by setting spark.sql.hive.metastore.jars to maven, however the command below from the documentation is giving me Permission denied errors:

%sh cp -r <path> /dbfs/hive_metastore_jar

<path> for me here was /local_disk0/tmp/hive-v13-4467a09c-d7cd-42cf-824f-47ca74cc7bc4 from the logs. I tried copying all of /local_disk0/tmp, but this directory was omitted from the copy.

I then tried just downloading the mysql connector JAR (mysql-connector-java-8.0.30.jar) from maven manually, uploaded it to dbfs, and created a global init script to copy the JAR to the local filesystem.

sleep 10
cp /dbfs/hive_metastore_jars/*.jar /databricks/hive_metastore_jars/

This global init script wasn't effective, it causes the cluster to not start saying it exited with non-zero exit code. I could be doing something very wrong here, haven't had to do this copy process in the past.

I tried setting spark.sql.hive.metastore.jars to the dbfs path /dbfs/hive_metastore_jars/* as I have successfully used this configuration in the past. This didn't work, still getting the "driver not found in classpath" error.

Anything I am missing here or any ideas on how I can resolve this?

Thanks,

Seth

Seth_J
New Contributor III

I realized today what I was doing wrong.

I forgot to add the mysql library in the Libraries tab of the cluster. This resolved the issue for the cluster with Databricks Runtime 10.4. I am able to connect to the metastore without issue.

Is there a way to add an external maven library to a SQL Warehouse?

Seth_J
New Contributor III

A Microsoft engineer suggested the same. This did get the SQL Warehouse connected to the metastore.

Odd as this same configuration does not work with a standard Databricks cluster.

Thanks for your help!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.