cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Endpoint with External Hive Metastore

Seth_J
New Contributor III

I am trying to setup a Databricks SQL endpoint to connect to an external hive metastore. I have replicated my metastore spark config in the SQL endpoint, with the addition of adding the below to the configuration:

spark.sql.hive.metastore.jars maven

 I keep getting the error

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

We are currently using Azure Databricks, Hive Metastore version 0.13, and an Azure MySQL database.

Here is the full SQL endpoint configuration:

spark.hadoop.javax.jdo.option.ConnectionPassword {{secrets/keyvault/db-secret-name}}
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://my-dbserver-name.mysql.database.azure.com:3306/my-db-name?sslMode=PREFERRED&verifyServerCertificate=true
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver
spark.sql.hive.metastore.version 0.13
spark.hadoop.javax.jdo.option.ConnectionUserName adminuser@my-db-name
spark.sql.hive.metastore.jars maven
spark.hadoop.hive.metastore.schema.verification false
spark.hadoop.hive.metastore.schema.verification.record.version false
spark.sql.hive.metastore.schema.verification.record.version true
spark.hadoop.datanucleus.autoCreateSchema false
spark.sql.hive.metastore.schema.verification true
spark.hadoop.datanucleus.schema.autoCreateTables false
spark.hadoop.datanucleus.fixedDatastore true

Do SQL endpoints support Hive Metastore version 0.13? Or any other guidance here would be appreciated.

8 REPLIES 8

User16741082858
Contributor III

https://docs.databricks.com/data/metastores/external-hive-metastore.html

In particular:

# Spark specific configuration options

spark.sql.hive.metastore.version <hive-version>

# Skip this one if <hive-version> is 0.13.x.

spark.sql.hive.metastore.jars <hive-jar-source>

Thanks for your reply. I tried without spark.sql.hive.metastore.jars first, but I saw in another thread here to include spark.sql.hive.metastore.jars albeit for another version of the metastore. Without spark.sql.hive.metastore.jars being specified I get the same error.

Seth_J
New Contributor III

Yes, unfortunately I haven't been able to figure it out yet. I have also reached out to some Azure reps to see if anyone there might be able to help.

Vivian_Wilfred
Databricks Employee
Databricks Employee

Hi @SETH JENSENโ€‹ , Just checking if you were able to solve this issue? If no, Can you try to modify the connection string and make sslMode=Require and see if it works?

Thanks for the suggestion. I tried sslMode=Require, sslMode=REQUIRED, and sslMode=VERIFY_CA from the MySQL driver docs, but I am still receiving the same error.

Seth_J
New Contributor III

Thanks, this was a great suggestion to try to figure out what's happening behind the scenes. Unfortunately, I am at a bit of a loss on what to do next.

I am getting the error:

Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

I also tried using "com.mysql.cj.jdbc.Driver" as the driver name, since that is what is in the MySQL connector documentation.

I have tried downloading the correct driver from the documentation by setting spark.sql.hive.metastore.jars to maven, however the command below from the documentation is giving me Permission denied errors:

%sh cp -r <path> /dbfs/hive_metastore_jar

<path> for me here was /local_disk0/tmp/hive-v13-4467a09c-d7cd-42cf-824f-47ca74cc7bc4 from the logs. I tried copying all of /local_disk0/tmp, but this directory was omitted from the copy.

I then tried just downloading the mysql connector JAR (mysql-connector-java-8.0.30.jar) from maven manually, uploaded it to dbfs, and created a global init script to copy the JAR to the local filesystem.

sleep 10
cp /dbfs/hive_metastore_jars/*.jar /databricks/hive_metastore_jars/

This global init script wasn't effective, it causes the cluster to not start saying it exited with non-zero exit code. I could be doing something very wrong here, haven't had to do this copy process in the past.

I tried setting spark.sql.hive.metastore.jars to the dbfs path /dbfs/hive_metastore_jars/* as I have successfully used this configuration in the past. This didn't work, still getting the "driver not found in classpath" error.

Anything I am missing here or any ideas on how I can resolve this?

Thanks,

Seth

Seth_J
New Contributor III

I realized today what I was doing wrong.

I forgot to add the mysql library in the Libraries tab of the cluster. This resolved the issue for the cluster with Databricks Runtime 10.4. I am able to connect to the metastore without issue.

Is there a way to add an external maven library to a SQL Warehouse?

Seth_J
New Contributor III

A Microsoft engineer suggested the same. This did get the SQL Warehouse connected to the metastore.

Odd as this same configuration does not work with a standard Databricks cluster.

Thanks for your help!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group