cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to fix intermittent 503 errors in 10.4 LTS

ebyhr
New Contributor II

I sometimes get the below error recently in version 10.4 LTS. Any solution to fix the intermittent failure? I added retry logic in our code, but Databricks query succeeded (even though it threw an exception) and it leads to the unexpected table status.

The error message:

[Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.

The full stacktrace:

io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
at io.trino.tests.product.utils.QueryExecutors$3.lambda$executeQuery$0(QueryExecutors.java:149)
at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
at io.trino.tests.product.utils.QueryExecutors$3.executeQuery(QueryExecutors.java:149)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility$CaseTestTable.<init>(TestDeltaLakeWriteDatabricksCompatibility.java:366)
at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition(TestDeltaLakeWriteDatabricksCompatibility.java:160)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
at com.databricks.client.hivecommon.api.HS2Client.handleTTransportException(Unknown Source)
at com.databricks.client.spark.jdbc.DowloadableFetchClient.handleTTransportException(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.databricks.client.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeRowCountQueryHelper(Unknown Source)
at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.databricks.client.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.databricks.client.jdbc.common.BaseStatement.execute(Unknown Source)
at com.databricks.client.hivecommon.jdbc42.Hive42Statement.execute(Unknown Source)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
... 24 more
Suppressed: java.lang.Exception: Query: INSERT INTO default.update_case_compat_zk3lu03mfzd5 VALUES (1, 1, 0), (2, 2, 0), (3, 3, 1)
at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
... 25 more
Caused by: com.databricks.client.support.exceptions.ErrorException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
... 35 more
Caused by: com.databricks.client.jdbc42.internal.apache.thrift.transport.TTransportException: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown
at com.databricks.client.hivecommon.HttpRetrySettings.shouldRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.shouldReexecuteRequest(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.executeWithRetry(Unknown Source)
at com.databricks.client.hivecommon.api.HS2ClientWrapper.ExecuteStatement(Unknown Source)
... 33 more

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

Maybe you can add additional validation to the output (that object exists). You can also share your code.

ebyhr
New Contributor II

Unfortunately, no we can't. There're so many code and the failed place isn't deterministic. https://github.com/trinodb/trino/issues/14391

The code is https://github.com/trinodb/tempto/blob/a3f013ae9faae1848972a25db40ba041c83b69d7/tempto-core/src/main.... It simply executes query, nothing special.

findinpath
Contributor

I experience the same situation.

Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.

I've attempted a retry on the client side via failsafe library, but this turns out to have the effect of doing a duplicate `INSERT` in case that the failure happens on an `INSERT` statement.

It seems that the error code 500593 is rather signaling that the operation took longer than expected.

I'm just wondering, can this situation be avoided by specifying a longer timeout ?

Anonymous
Not applicable

Hi @Yuya Ebihara​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

ebyhry
New Contributor II

The issue still happens.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.