cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Error Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient - while trying to create database

Karthig
New Contributor III

Hello All,

I get the org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient while trying to create a database

scripts used

%sql

USE hive_metastore;

CREATE DATABASE XYZ;

%sql

CREATE DATABASE XYZ;

%sql

CREATE DATABASE hive_metastore.XYZ;

the warehouse seems to be in started state

imageimage.png 

Cluster details

image 

15 REPLIES 15

Anonymous
Not applicable

Hi @Karthigesan Vijayakumarā€‹ 

Great to meet you, and thanks for your question! 

Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon.

Thanks.

Mentens
New Contributor II

I have run into the same issue, any update on this issue?

addy
New Contributor III

Facing the same error. Any updates on this ?

This issue is getting worse: it's happening more often, and persisting for longer periods of time. It's getting harder & harder to work around it.

Please do something. The error is clearly not on the customers' side.

mroy
Contributor

Same issue:

Starting about one month ago, we've been getting those error on jobs/workflows that have been running successfully for years, without any code change.

No idea what is causing it, or how to fix, but it seems like adding a sleep/pause at the beginning of the notebook is helping... so maybe something is taking a while to initialize on the cluster. šŸ¤·ā€ā™‚ļø

Jobs are running on DBR 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

In our case, it's hard to debug, because we're using pyspark, and:

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
 
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    121                 # Hide where the exception came from that shows a non-Pythonic
    122                 # JVM exception message.
--> 123                 raise converted from None
    124             else:
    125                 raise
 
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Vijaykumarj
New Contributor III

please let us know if you got any solution for this issue

Still no solution. Pausing the script is only a stopgap measure.

The issue is on Databrick's side, there's nothing we can do about it, and it seems to be getting worse.

@Vidula Khannaā€‹: Any feedback from Databricks on this?

mroy
Contributor

@Vijay Kumar Jā€‹: So far, adding a sleep/pause at the top of the notebook has been the only thing that works:

# Sleep/Pause for 2 minutes, to give the Hive Catalog time to initialize.
import time
time.sleep(120)

It has reduced the errors by 99%. We still get them occasionally, so maybe a longer pause would be enough to handle that last 1%.

Vijaykumarj
New Contributor III

Tried but same error, I also tried creating manually using UI but same error

HTTP ERROR: 500

Problem accessing /import/new-table. Reason:

com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Mentens
New Contributor II

Our issue is resolved it was related to a firewall that was blocking us to perform certain commands

Srividya1
New Contributor II

Got the same issue with both UI and notebook. Tried with sleep/pause at the top of the notebook but didn't work. Please let me know if you got any other solution for this issue.

mroy
Contributor

Alright, we've implemented a workaround for this, and so far it's been working very well:

  • First, we created a reusable notebook to wait until Hive has been initialized (see code below).
  • We then execute this notebook using the %run command at the top of any notebook which is encountering the Hive issue.

Here is the code:

import time
 
retries = 0
max_retries = 10
while True:
  try:
    # Use this table to check if Hive is ready, since it's very small & all in 1 file
    table("database.small_table")
    break
  except Exception as e:
    if retries == max_retries:
      raise e
      
    retries += 1
    print(f"Hive is not initialized yet. Retrying in 60 seconds. (Retry #{retries})")
    time.sleep(60)
    
print("Hive is initialized!")

And here is what the output looks like:

image.png

Alright, good news! We've had one job fail after the 10 maximum retries, and it ended up producing a much more complete stack trace than the single `AnalysisException` we typically get.

tl;dr: It seems like the underlying issue (in our case, at least), is too many connections to the Hive metastore, which is basically a MariaDB instance hosted by Databricks. This answer provides some context behind the error.

The full stack trace is attached below, where we can see that the originating exception (at the bottom) is an SQLException due to too many connections, in the org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol package. Here is the snippet:

Caused by: java.sql.SQLException: Too many connections
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.authentication(AbstractConnectProtocol.java:856)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.handleConnectionPhases(AbstractConnectProtocol.java:777)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:451)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1103)

This, along with the original HiveMetaStoreClient exception, pretty much confirms that the root cause of the issue is indeed too many connections to the Hive metastore (the MariaDB instance).

JacekLaskowski
New Contributor III

That's exactly my case!

This is what I saw in `databricks bundle run my_job`:

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

And this is what I found in the Log4j output of the cluster:

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "HikariCP" plugin to create a ConnectionPool gave an error : Failed to initialize pool: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
	... 123 more
Caused by: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:512)
	at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:105)
	at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:71)
	at org.datanucleus.store.rdbms.connectionpool.HikariCPConnectionPoolFactory.createConnectionPool(HikariCPConnectionPoolFactory.java:176)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
	... 125 more
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to address=(host=consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306)(type=master) : Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
	at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:73)
	at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:197)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1404)
	at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:635)
	at org.mariadb.jdbc.MariaDbConnection.newConnection(MariaDbConnection.java:150)
	at org.mariadb.jdbc.Driver.connect(Driver.java:89)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
	at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:341)
	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:506)
	... 129 more
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 : Connection reset
	at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:73)
	at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:188)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:588)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1399)
	... 136 more
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.fillBuffer(ReadAheadBufferedStream.java:131)
	at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.read(ReadAheadBufferedStream.java:104)
	at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:247)
	at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:218)
	at org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket.<init>(ReadInitialHandShakePacket.java:89)
	at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.createConnection(AbstractConnectProtocol.java:540)
	... 137 more

consolidated-northeuropec2-prod-metastore-0.mysql.database.azure.com:3306 is mentioned in https://learn.microsoft.com/en-us/azure/databricks/release-notes/product/2022/january#additional-met....

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group