โ07-27-2023 05:27 PM
I have created a Databricks cluster with below configurations.
Databricks Runtime Version
13.2 ML (includes Apache Spark 3.4.0, Scala 2.12)
Node type
i3.xlarge
30.5 GB Memory, 4 Cores
I created a notebook and trying to load the Mysql table which resides in AWS RDS.
So, I have installed this jar file `mysql-connector-j-8.0.33.jar` into my cluster then I tried executing the below code in a notebook.
table = "test_table"
database_host = "test-db-dev.cluster-ro-pkh5s13hvg.us-west-2.rds.amazonaws.com"
database_port = "3306"
database_name = "test_db"
user = "test_user"
password = "12345678"
url = f"jdbc:mysql://{database_host}:{database_port}/{database_name}"
df_rds_table = spark.read \
.format("jdbc") \
.option("driver","com.mysql.cj.jdbc.Driver") \
.option("url", url) \
.option("dbtable", table) \
.option("user", user) \
.option("password", password) \
.load()
The command execution was running for long time and finally ended up with failed status by throwing the below error.
java.lang.Exception: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:167)
at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:114)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:152)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:364)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:151)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:323)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:302)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:297)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:60)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$resourceLoader$1(HiveSessionStateBuilder.scala:67)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client$lzycompute(HiveSessionStateBuilder.scala:166)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.client(HiveSessionStateBuilder.scala:166)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1(HiveSessionStateBuilder.scala:170)
at org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1$adapted(HiveSessionStateBuilder.scala:169)
at scala.collection.immutable.List.foreach(List.scala:431)
.
.
.
I observed that after installing the library I could see the below status in Cluster's Event log.
โ07-27-2023 10:09 PM
@DineshKumar
I'm not 100% sure what the issue could be, but my lucky guess is that you're reinstalling the JDBC driver that is used to connect to Hive Metastore (it's on MySQL though).
Try to use the one that is already installed: https://docs.databricks.com/external-data/mysql.html
โ07-28-2023 03:07 AM
I tried uninstalling the JDBC driver but this time I am getting the below classNotFound error.
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
Also, I tried using the one (jdbc) from above link which you shared. I am getting the below error.
java.sql.SQLInvalidAuthorizationSpecException: Could not connect to address=(host=test-db-dev.cluster-ro-pkh5s13hvg.us-west-2.rds.amazonaws.com)
(port=3306)(type=master) : (conn=12345676) Access denied for user 'test_user'@'10.2255.1134.111' (using password: YES)
โ07-28-2023 03:32 AM
@DineshKumar
Don't try to remove it. Just don't install it ๐
About the error - it's clearly permissions issue to the database. Firstly check if the user has got an access to the database. Secondly, check if there's no firewall - if yes make sure that the address range that you're connecting from is whitelisted.
โ07-28-2023 03:51 AM
I have checked that the user has read only access to the database. I connected through Dbeaver with same credentials.
Secondly, can you guide me on how to check if there is no firewall or not. Where can I check this. Kindly guide me on this.
Thanks!
โ07-30-2023 10:01 PM
@DineshKumar
Please check this stackoverflow question:
https://stackoverflow.com/questions/8380797/enable-remote-mysql-connection-error-1045-28000-access-d...
โ07-30-2023 11:05 PM - edited โ07-30-2023 11:20 PM
Hi, The below error describes that there is an issue connecting to the host from Databricks, you can find more details about the network configurations here at https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html
Could not connect to address=(host=test-db-dev.cluster-ro-pkh5s13hvg.us-west-2.rds.amazonaws.com) (port=3306)(type=master) : (conn=12345676) Access denied for user 'test_user'@'10.2255.1134.111' (using password: YES)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group