04-26-2022 01:55 PM
I am trying to setup a Databricks SQL endpoint to connect to an external hive metastore. I have replicated my metastore spark config in the SQL endpoint, with the addition of adding the below to the configuration:
spark.sql.hive.metastore.jars maven
I keep getting the error
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
We are currently using Azure Databricks, Hive Metastore version 0.13, and an Azure MySQL database.
Here is the full SQL endpoint configuration:
spark.hadoop.javax.jdo.option.ConnectionPassword {{secrets/keyvault/db-secret-name}}
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://my-dbserver-name.mysql.database.azure.com:3306/my-db-name?sslMode=PREFERRED&verifyServerCertificate=true
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver
spark.sql.hive.metastore.version 0.13
spark.hadoop.javax.jdo.option.ConnectionUserName adminuser@my-db-name
spark.sql.hive.metastore.jars maven
spark.hadoop.hive.metastore.schema.verification false
spark.hadoop.hive.metastore.schema.verification.record.version false
spark.sql.hive.metastore.schema.verification.record.version true
spark.hadoop.datanucleus.autoCreateSchema false
spark.sql.hive.metastore.schema.verification true
spark.hadoop.datanucleus.schema.autoCreateTables false
spark.hadoop.datanucleus.fixedDatastore true
Do SQL endpoints support Hive Metastore version 0.13? Or any other guidance here would be appreciated.
04-26-2022 09:11 PM
https://docs.databricks.com/data/metastores/external-hive-metastore.html
In particular:
# Spark specific configuration options
spark.sql.hive.metastore.version <hive-version>
# Skip this one if <hive-version> is 0.13.x.
spark.sql.hive.metastore.jars <hive-jar-source>
04-27-2022 07:02 AM
Thanks for your reply. I tried without spark.sql.hive.metastore.jars first, but I saw in another thread here to include spark.sql.hive.metastore.jars albeit for another version of the metastore. Without spark.sql.hive.metastore.jars being specified I get the same error.
05-16-2022 04:03 PM
Yes, unfortunately I haven't been able to figure it out yet. I have also reached out to some Azure reps to see if anyone there might be able to help.
06-24-2022 06:30 AM
Hi @SETH JENSEN , Just checking if you were able to solve this issue? If no, Can you try to modify the connection string and make sslMode=Require and see if it works?
07-06-2022 03:10 PM
Thanks for the suggestion. I tried sslMode=Require, sslMode=REQUIRED, and sslMode=VERIFY_CA from the MySQL driver docs, but I am still receiving the same error.
07-27-2022 01:21 PM
Thanks, this was a great suggestion to try to figure out what's happening behind the scenes. Unfortunately, I am at a bit of a loss on what to do next.
I am getting the error:
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
I also tried using "com.mysql.cj.jdbc.Driver" as the driver name, since that is what is in the MySQL connector documentation.
I have tried downloading the correct driver from the documentation by setting spark.sql.hive.metastore.jars to maven, however the command below from the documentation is giving me Permission denied errors:
%sh cp -r <path> /dbfs/hive_metastore_jar
<path> for me here was /local_disk0/tmp/hive-v13-4467a09c-d7cd-42cf-824f-47ca74cc7bc4 from the logs. I tried copying all of /local_disk0/tmp, but this directory was omitted from the copy.
I then tried just downloading the mysql connector JAR (mysql-connector-java-8.0.30.jar) from maven manually, uploaded it to dbfs, and created a global init script to copy the JAR to the local filesystem.
sleep 10
cp /dbfs/hive_metastore_jars/*.jar /databricks/hive_metastore_jars/
This global init script wasn't effective, it causes the cluster to not start saying it exited with non-zero exit code. I could be doing something very wrong here, haven't had to do this copy process in the past.
I tried setting spark.sql.hive.metastore.jars to the dbfs path /dbfs/hive_metastore_jars/* as I have successfully used this configuration in the past. This didn't work, still getting the "driver not found in classpath" error.
Anything I am missing here or any ideas on how I can resolve this?
Thanks,
Seth
07-28-2022 02:05 PM
I realized today what I was doing wrong.
I forgot to add the mysql library in the Libraries tab of the cluster. This resolved the issue for the cluster with Databricks Runtime 10.4. I am able to connect to the metastore without issue.
Is there a way to add an external maven library to a SQL Warehouse?
08-17-2022 09:14 AM
A Microsoft engineer suggested the same. This did get the SQL Warehouse connected to the metastore.
Odd as this same configuration does not work with a standard Databricks cluster.
Thanks for your help!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group