Application connects to Databricks serverless SQL warehouse via Databricks JDBC driver. It executes SQL select statements only. We see small number of statements failed each day with the following error detail:
java.sql.SQLException: [Databricks][JDBCDriver](500157) Socket timeout expired when executing query: SELECT COA***.; caused by com.databricks.client.support.exceptions.GeneralException: [Databricks][JDBCDriver](500157) Socket timeout expired when executing query: SELECT COA***.; caused by com.databricks.client.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out; caused by java.net.SocketTimeoutException: Read timed out
I can provide full stack trace on request.
Observations around failed statements:
- The statements fail typically after 10 seconds but there were several failures also after 15 or 16 seconds.
- Execution never gets into fetching phase.
- Failed statements continue execution on server side. Statements fail on server-side eventually, but it takes even 10 minutes or more. That time is reported as fetching time.
- When the failed statement is executed from sql console, it pass and data are fetched quickly as most of the queries returns low number of records.
- Failures happen during high load. serverless SQL warehouse is on its maximal configuration, i.e. no scale down or up during failures. If the HW is switched to more powerful nodes, it has no effect.
JDBC driver configuration:
- Driver version 2.6.38, happens also with 2.6.36
- Authentication by PAT
- SocketTimeOut either 0 or 300, happens for both values
- Preset catalog and schema
- Connection initialized with "SET ansi_mode = false"
Connection pool settings:
- HikariCP version 5.1.0
- Maximally 30 parallel connections to the SQL warehouse
I am looking for any advice how to mitigate the failures.