10-02-2022 11:06 PM
I sometimes get the below error recently in version 10.4 LTS. Any solution to fix the intermittent failure? I added retry logic in our code, but Databricks query succeeded (even though it threw an exception) and it leads to the unexpected table status.
The error message:
[Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
The full stacktrace:
io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
at io.trino.tests.product.utils.QueryExecutors$3.lambda$executeQuery$0(QueryExecutors.java:149)
at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
at io.trino.tests.product.utils.QueryExecutors$3.executeQuery(QueryExecutors.java:149)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility$CaseTestTable.<init>(TestDeltaLakeWriteDatabricksCompatibility.java:366)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition(TestDeltaLakeWriteDatabricksCompatibility.java:160)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at com.databricks.client.hivecommon.api.HS2Client.handleTTransportException(Unknown Source)
at com.databricks.client.spark.jdbc.DowloadableFetchClient.handleTTransportException(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeRowCountQueryHelper(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.databricks.client.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.databricks.client.jdbc.common.BaseStatement.execute(Unknown Source)
at com.databricks.client.hivecommon.jdbc42.Hive42Statement.execute(Unknown Source)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
... 24 more
Suppressed: java.lang.Exception: Query: INSERT INTO default.update_case_compat_zk3lu03mfzd5 VALUES (1, 1, 0), (2, 2, 0), (3, 3, 1)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
... 25 more
Caused by: com.databricks.client.support.exceptions.ErrorException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
... 35 more
Caused by: com.databricks.client.jdbc42.internal.apache.thrift.transport.TTransportException: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown
at com.databricks.client.hivecommon.HttpRetrySettings.shouldRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.shouldReexecuteRequest(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.executeWithRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.ExecuteStatement(Unknown Source)
... 33 more
10-03-2022 12:23 PM
Maybe you can add additional validation to the output (that object exists). You can also share your code.
10-03-2022 02:06 PM
Unfortunately, no we can't. There're so many code and the failed place isn't deterministic. https://github.com/trinodb/trino/issues/14391
The code is https://github.com/trinodb/tempto/blob/a3f013ae9faae1848972a25db40ba041c83b69d7/tempto-core/src/main.... It simply executes query, nothing special.
10-27-2022 01:41 AM
I experience the same situation.
Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
I've attempted a retry on the client side via failsafe library, but this turns out to have the effect of doing a duplicate `INSERT` in case that the failure happens on an `INSERT` statement.
It seems that the error code 500593 is rather signaling that the operation took longer than expected.
I'm just wondering, can this situation be avoided by specifying a longer timeout ?
10-28-2022 11:17 PM
Hi @Yuya Ebihara
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
10-30-2022 07:59 PM
The issue still happens.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group