Using ODBC or JDBC to read from a table fails when I attempt to use an ORDER BY clause. In one sample case, I have a fairly small table (just 1946 rows).
select *
from some_table
order by some_field
Result:
java.lang.IllegalArgumentException: requirement failed: Subquery subquery#485, [id=#937] has not finished
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:53)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:445)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:269)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:54)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:247)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:281)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.execution.ScalarSubquery.eval(subquery.scala:100)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)
This seems pretty weird to meโI can't run a simple ORDER BY clause?
โ
I've tried this with the ODBC driver and JDBC (2.6.32). With JDBC, I tried all three modes for UseNativeQuery (0, 1, 2) just to exhaust all the options. I do not have any query timeout specified, though specifying a large timeout value made no difference.
โ
The immediate impression is that there's something wrong with the JDBC driver, though of course if I'm missing something obvious, that would be all the simplerโit seems a surprising bug to find. If this really is a bug, where would I go to report it?