โ04-26-2022 01:55 PM
I am trying to setup a Databricks SQL endpoint to connect to an external hive metastore. I have replicated my metastore spark config in the SQL endpoint, with the addition of adding the below to the configuration:
spark.sql.hive.metastore.jars maven
I keep getting the error
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
We are currently using Azure Databricks, Hive Metastore version 0.13, and an Azure MySQL database.
Here is the full SQL endpoint configuration:
spark.hadoop.javax.jdo.option.ConnectionPassword {{secrets/keyvault/db-secret-name}}
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://my-dbserver-name.mysql.database.azure.com:3306/my-db-name?sslMode=PREFERRED&verifyServerCertificate=true
spark.hadoop.javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver
spark.sql.hive.metastore.version 0.13
spark.hadoop.javax.jdo.option.ConnectionUserName adminuser@my-db-name
spark.sql.hive.metastore.jars maven
spark.hadoop.hive.metastore.schema.verification false
spark.hadoop.hive.metastore.schema.verification.record.version false
spark.sql.hive.metastore.schema.verification.record.version true
spark.hadoop.datanucleus.autoCreateSchema false
spark.sql.hive.metastore.schema.verification true
spark.hadoop.datanucleus.schema.autoCreateTables false
spark.hadoop.datanucleus.fixedDatastore true
Do SQL endpoints support Hive Metastore version 0.13? Or any other guidance here would be appreciated.
โ04-26-2022 09:11 PM
https://docs.databricks.com/data/metastores/external-hive-metastore.html
In particular:
# Spark specific configuration options
spark.sql.hive.metastore.version <hive-version>
# Skip this one if <hive-version> is 0.13.x.
spark.sql.hive.metastore.jars <hive-jar-source>
โ04-27-2022 07:02 AM
Thanks for your reply. I tried without spark.sql.hive.metastore.jars first, but I saw in another thread here to include spark.sql.hive.metastore.jars albeit for another version of the metastore. Without spark.sql.hive.metastore.jars being specified I get the same error.
โ05-16-2022 04:03 PM
Yes, unfortunately I haven't been able to figure it out yet. I have also reached out to some Azure reps to see if anyone there might be able to help.
โ06-24-2022 06:30 AM
Hi @SETH JENSENโ , Just checking if you were able to solve this issue? If no, Can you try to modify the connection string and make sslMode=Require and see if it works?
โ07-06-2022 03:10 PM
Thanks for the suggestion. I tried sslMode=Require, sslMode=REQUIRED, and sslMode=VERIFY_CA from the MySQL driver docs, but I am still receiving the same error.
โ07-27-2022 01:21 PM
Thanks, this was a great suggestion to try to figure out what's happening behind the scenes. Unfortunately, I am at a bit of a loss on what to do next.
I am getting the error:
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
I also tried using "com.mysql.cj.jdbc.Driver" as the driver name, since that is what is in the MySQL connector documentation.
I have tried downloading the correct driver from the documentation by setting spark.sql.hive.metastore.jars to maven, however the command below from the documentation is giving me Permission denied errors:
%sh cp -r <path> /dbfs/hive_metastore_jar
<path> for me here was /local_disk0/tmp/hive-v13-4467a09c-d7cd-42cf-824f-47ca74cc7bc4 from the logs. I tried copying all of /local_disk0/tmp, but this directory was omitted from the copy.
I then tried just downloading the mysql connector JAR (mysql-connector-java-8.0.30.jar) from maven manually, uploaded it to dbfs, and created a global init script to copy the JAR to the local filesystem.
sleep 10
cp /dbfs/hive_metastore_jars/*.jar /databricks/hive_metastore_jars/
This global init script wasn't effective, it causes the cluster to not start saying it exited with non-zero exit code. I could be doing something very wrong here, haven't had to do this copy process in the past.
I tried setting spark.sql.hive.metastore.jars to the dbfs path /dbfs/hive_metastore_jars/* as I have successfully used this configuration in the past. This didn't work, still getting the "driver not found in classpath" error.
Anything I am missing here or any ideas on how I can resolve this?
Thanks,
Seth
โ07-28-2022 02:05 PM
I realized today what I was doing wrong.
I forgot to add the mysql library in the Libraries tab of the cluster. This resolved the issue for the cluster with Databricks Runtime 10.4. I am able to connect to the metastore without issue.
Is there a way to add an external maven library to a SQL Warehouse?
โ08-17-2022 09:14 AM
A Microsoft engineer suggested the same. This did get the SQL Warehouse connected to the metastore.
Odd as this same configuration does not work with a standard Databricks cluster.
Thanks for your help!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now