10-27-2022 04:55 AM
Hello All,
I get the org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient while trying to create a database
scripts used
%sql
USE hive_metastore;
CREATE DATABASE XYZ;
%sql
CREATE DATABASE XYZ;
%sql
CREATE DATABASE hive_metastore.XYZ;
the warehouse seems to be in started state
Cluster details
11-27-2022 10:50 PM
Hi @Karthigesan Vijayakumar
Great to meet you, and thanks for your question!
Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon.
Thanks.
12-14-2022 02:54 AM
I have run into the same issue, any update on this issue?
02-02-2023 10:38 AM
Facing the same error. Any updates on this ?
02-18-2023 09:01 AM
This issue is getting worse: it's happening more often, and persisting for longer periods of time. It's getting harder & harder to work around it.
Please do something. The error is clearly not on the customers' side.
12-14-2022 08:10 AM
Same issue:
Starting about one month ago, we've been getting those error on jobs/workflows that have been running successfully for years, without any code change.
No idea what is causing it, or how to fix, but it seems like adding a sleep/pause at the beginning of the notebook is helping... so maybe something is taking a while to initialize on the cluster. 🤷♂️
Jobs are running on DBR 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
In our case, it's hard to debug, because we're using pyspark, and:
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
121 # Hide where the exception came from that shows a non-Pythonic
122 # JVM exception message.
--> 123 raise converted from None
124 else:
125 raise
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
12-20-2022 02:47 AM
please let us know if you got any solution for this issue
02-18-2023 09:06 AM
Still no solution. Pausing the script is only a stopgap measure.
The issue is on Databrick's side, there's nothing we can do about it, and it seems to be getting worse.
@Vidula Khanna: Any feedback from Databricks on this?
12-20-2022 07:51 AM
@Vijay Kumar J: So far, adding a sleep/pause at the top of the notebook has been the only thing that works:
# Sleep/Pause for 2 minutes, to give the Hive Catalog time to initialize.
import time
time.sleep(120)
It has reduced the errors by 99%. We still get them occasionally, so maybe a longer pause would be enough to handle that last 1%.
12-20-2022 07:13 PM
Tried but same error, I also tried creating manually using UI but same error
HTTP ERROR: 500
Problem accessing /import/new-table. Reason:
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
12-21-2022 02:08 AM
Our issue is resolved it was related to a firewall that was blocking us to perform certain commands
01-03-2023 05:07 AM
Got the same issue with both UI and notebook. Tried with sleep/pause at the top of the notebook but didn't work. Please let me know if you got any other solution for this issue.
02-21-2023 12:23 PM
Alright, we've implemented a workaround for this, and so far it's been working very well:
Here is the code:
import time
retries = 0
max_retries = 10
while True:
try:
# Use this table to check if Hive is ready, since it's very small & all in 1 file
table("database.small_table")
break
except Exception as e:
if retries == max_retries:
raise e
retries += 1
print(f"Hive is not initialized yet. Retrying in 60 seconds. (Retry #{retries})")
time.sleep(60)
print("Hive is initialized!")
And here is what the output looks like:
02-22-2023 07:15 PM
Alright, good news! We've had one job fail after the 10 maximum retries, and it ended up producing a much more complete stack trace than the single `AnalysisException` we typically get.
tl;dr: It seems like the underlying issue (in our case, at least), is too many connections to the Hive metastore, which is basically a MariaDB instance hosted by Databricks. This answer provides some context behind the error.
The full stack trace is attached below, where we can see that the originating exception (at the bottom) is an SQLException due to too many connections, in the org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol package. Here is the snippet:
Caused by: java.sql.SQLException: Too many connections
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.authentication(AbstractConnectProtocol.java:856)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.handleConnectionPhases(AbstractConnectProtocol.java:777)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:451)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1103)
This, along with the original HiveMetaStoreClient exception, pretty much confirms that the root cause of the issue is indeed too many connections to the Hive metastore (the MariaDB instance).
07-10-2024 06:49 AM
That's exactly my case!
This is what I saw in `databricks bundle run my_job`:
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
And this is what I found in the Log4j output of the cluster:
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "HikariCP" plugin to create a ConnectionPool gave an error : Failed to initialize pool: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
... 123 more
Caused by: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:512)
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:105)
at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:71)
at org.datanucleus.store.rdbms.connectionpool.HikariCPConnectionPoolFactory.createConnectionPool(HikariCPConnectionPoolFactory.java:176)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
... 125 more
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:73)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:197)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1404)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:635)
at org.mariadb.jdbc.MariaDbConnection.newConnection(MariaDbConnection.java:150)
at org.mariadb.jdbc.Driver.connect(Driver.java:89)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:341)
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:506)
... 129 more
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:73)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:188)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:588)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1399)
... 136 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.fillBuffer(ReadAheadBufferedStream.java:131)
at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.read(ReadAheadBufferedStream.java:104)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:247)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:218)
at org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket.<init>(ReadInitialHandShakePacket.java:89)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:540)
... 137 more
consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 is mentioned in https://learn.microsoft.com/en-us/azure/databricks/release-notes/product/2022/january#additional-met....
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group