How to fix intermittent 503 errors in 10.4 LTS

ebyhr
New Contributor II

I sometimes get the below error recently in version 10.4 LTS. Any solution to fix the intermittent failure? I added retry logic in our code, but Databricks query succeeded (even though it threw an exception) and it leads to the unexpected table status.

The error message:

[Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.

The full stacktrace:

io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
at io.trino.tests.product.utils.QueryExecutors$3.lambda$executeQuery$0(QueryExecutors.java:149)
at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
at io.trino.tests.product.utils.QueryExecutors$3.executeQuery(QueryExecutors.java:149)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility$CaseTestTable.<init>(TestDeltaLakeWriteDatabricksCompatibility.java:366)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition(TestDeltaLakeWriteDatabricksCompatibility.java:160)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at com.databricks.client.hivecommon.api.HS2Client.handleTTransportException(Unknown Source)
at com.databricks.client.spark.jdbc.DowloadableFetchClient.handleTTransportException(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeRowCountQueryHelper(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.databricks.client.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.databricks.client.jdbc.common.BaseStatement.execute(Unknown Source)
at com.databricks.client.hivecommon.jdbc42.Hive42Statement.execute(Unknown Source)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
... 24 more
Suppressed: java.lang.Exception: Query: INSERT INTO default.update_case_compat_zk3lu03mfzd5 VALUES (1, 1, 0), (2, 2, 0), (3, 3, 1)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
... 25 more
Caused by: com.databricks.client.support.exceptions.ErrorException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
... 35 more
Caused by: com.databricks.client.jdbc42.internal.apache.thrift.transport.TTransportException: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown
at com.databricks.client.hivecommon.HttpRetrySettings.shouldRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.shouldReexecuteRequest(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.executeWithRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.ExecuteStatement(Unknown Source)
... 33 more