cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Facing Issues with Databricks JDBC Connectivity after Idle time

AbhiJ
New Contributor III

Hello team, 

I am using commons(commons-dbcp2) Datasource which supports default connection pooling in Spring Java application (rest services to fetch databricks data via JDBC template).

Initially all works fine and can get the data from databricks via jdbc template and already created databricks connection from connection pool.

Databricks JDBC url format is : 

jdbc:databricks://dbc-f2536520-fc66.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/7781542073893090/0329-163852-fxzadblp;AuthMech=3;UID=token;PWD=<PWDTOKEN>

The databricks jar which I am using for datasource connection is

<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-jdbc</artifactId>
<version>2.6.33</version>
</dependency>

The issue is when I stay the running server idle for more than 20 mins, I starts getting below issue from databricks.

[Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Invalid SessionHandle: SessionHandle [56eea0f9-f7ee-400e-b2f6-5e9cec51c2a4]:62:61, org.apache.hive.service.cli.session.SessionManager:getSession:SessionManager.java:349, org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager:getSession:SparkSQLSessionManager.scala:179, com.databricks.sql.hive.thriftserver.thrift.ThriftHandlerUtils$:getSessionHandle:ThriftHandlerUtils.scala:55,

 

Please let me know if anyone has faced this issue and what is the resolution for it.

 

 

 

3 REPLIES 3

AbhiJ
New Contributor III

@Retired_mod ,

The links which you pasted seems doesn't works.

Anyways, please find my response for your suggestions.

1. Check if the session handle is valid and active. If the session has expired, a new session needs to be created.

I am using commons connection pooling which creates a singleton datasource object with already created connections in advance.

Please note that initially everything works fine and I am able to get the data from databricks.

The issues arises after I don't do any activity after initial fetch for some 20 mins (with server running) and then again try to fetch the data from databricks via same datasource(pooled connection) which got created initially.

I was expecting the databricks JDBC connector jar should handle this error and create a new connection if the existing connection got stale.

2. Check if there are any issues with the session manager. This can be done by checking the logs of the session manager and verifying if it is running correctly.

I get's the same logs from databricks log4j logs (Driver logs tab of compute cluster section)

3. Verify if the JDBC driver is compatible with the Databricks version. If not, update the driver to a compatible version.

I am using 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) and since I am using maven databricks jdbc latest version (<version>2.6.33</version>), It should include support for 12.2 LTS version.
 

4. Check if there are any network connectivity issues between the client and the Databricks cluster. This can be done by running a ping test or using a network monitoring tool.

There are no network issues as I initially mentioned, the initial fetch works fine and later after some idle time, databricks throws invalid session handle error.

 

Also, I would like to mention if I open JDBC connection and close it for every requests, it works fine irrespective of idle time but I want's to leverage the benefit of connection pooling (already created connections) instead of creating and closing the connections for every request.

I think the error needs to be handled inside databricks connector jar.


 

NandiniN
Databricks Employee
Databricks Employee

Hi @AbhiJ ,

Looking at the error it appears the session is invalid/expired. 

The error usually occurs when the jdbc/odbc client re-accesses the cluster with the same session id after the idle session timeout threshold. Ideally restarting the client could be a workaround. Also you can set the following property into the Spark Config of the cluster.```spark.hadoop.hive.server2.idle.session.timeout```
The default value is 900000 - 15minutes (and you mentioned it happens after 20 min)

Restarting the client will help.

Please let me know if that helps. 

Thanks!

ash42
New Contributor II

I am seeing the same issue with hikari. When a pooled connection is created then the databricks cluster is terminated (or restarted), the HikariDataSource retains a stale session handle.

Why does connection.isValid() returns true then executing any query on the same connection will fail with invalid sessionHandle. I was expecting the connection.isValid() to return false in this case.

To reproduce:

  1. Create a HikariDataSource referencing a databricks cluster.
  2. Terminate the databricks cluster.
  3. get a connection from the datasource like HikariDataSource.getConnection()
  4. connection.isValid(1) return true.
  5. connection.executeQuery("SELECT 1"), this will block until the cluster is restarted then fail with java.sql.SQLException: [Databricks][JDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: 08000, Query: SELECT 1, Error message from Server: Invalid SessionHandle.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group